| Reference : Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization |
| Scientific congresses, symposiums and conference proceedings : Unpublished conference | |||
| Engineering, computing & technology : Computer science | |||
| http://hdl.handle.net/10993/56061 | |||
| Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization | |
| English | |
Lamsiyah, Salima [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >] | |
Murugaraj, Keerthana [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)] | |
Schommer, Christoph [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) >] | |
| 4-Aug-2023 | |
| Yes | |
| 8th International Conference on Computer and Information Science and Technology (CIST 2023) | |
| 3-5 August 2023 | |
| [en] Extractive Text Summarization ; Historical Domain ; Pre-trained Language Models ; HistBERT ; Transfer Learning | |
| [en] In recent years, pre-trained language models (PLMs) have shown remarkable advancements in the extractive summarization
task across diverse domains. However, there remains a lack of research specifically in the historical domain. In this paper, we propose a novel method for extractive historical single-document summarization that leverages the potential of a domain-aware historical bidirectional language model, pre-trained on a large-scale historical corpus. Subsequently, we fine-tune the language model specifically for the task of extractive historical single-document summarization. One major challenge for this task is the lack of annotated datasets for historical summarization. To address this issue, we construct a dataset by collecting archived historical documents from the Centre Virtuel de la Connaissance sur l’Europe (CVCE) group at the University of Luxembourg. Furthermore, to better learn the structural features of the input documents, we use a sentence position embedding mechanism that enables the model to learn the position information of sentences. The overall experimental results on our historical dataset collected from the CVCE group show that our method outperforms recent state-of-the-art methods in terms of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores. To the best of our knowledge, this is the first work on extractive historical text summarization. | |
| http://hdl.handle.net/10993/56061 | |
| 10.11159/cist23.152 | |
| https://avestia.com/EECSS2023_Proceedings/files/paper/CIST/CIST_152.pdf |
| File(s) associated to this reference | ||||||||||||||
|
Fulltext file(s):
| ||||||||||||||
All documents in ORBilu are protected by a user license.