Reference : Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization
Scientific congresses, symposiums and conference proceedings : Unpublished conference
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/56061
Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization
English
Lamsiyah, Salima mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
Murugaraj, Keerthana mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)]
Schommer, Christoph mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >]
4-Aug-2023
Yes
8th International Conference on Computer and Information Science and Technology (CIST 2023)
3-5 August 2023
[en] Extractive Text Summarization ; Historical Domain ; Pre-trained Language Models ; HistBERT ; Transfer Learning
[en] In recent years, pre-trained language models (PLMs) have shown remarkable advancements in the extractive summarization
task across diverse domains. However, there remains a lack of research specifically in the historical domain. In this paper, we propose a
novel method for extractive historical single-document summarization that leverages the potential of a domain-aware historical
bidirectional language model, pre-trained on a large-scale historical corpus. Subsequently, we fine-tune the language model specifically
for the task of extractive historical single-document summarization. One major challenge for this task is the lack of annotated datasets
for historical summarization. To address this issue, we construct a dataset by collecting archived historical documents from the Centre
Virtuel de la Connaissance sur l’Europe (CVCE) group at the University of Luxembourg. Furthermore, to better learn the structural
features of the input documents, we use a sentence position embedding mechanism that enables the model to learn the position information
of sentences. The overall experimental results on our historical dataset collected from the CVCE group show that our method outperforms
recent state-of-the-art methods in terms of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores. To the best of our knowledge, this is the
first work on extractive historical text summarization.
http://hdl.handle.net/10993/56061
10.11159/cist23.152
https://avestia.com/EECSS2023_Proceedings/files/paper/CIST/CIST_152.pdf

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
CIST_152.pdfPublisher postprint405.11 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.