References of "Lamsiyah, Salima 50062715"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailHistorical-Domain Pre-trained Language Model for Historical Extractive Text Summarization
Lamsiyah, Salima UL; Murugaraj, Keerthana; Schommer, Christoph UL

Scientific Conference (2023, August 04)

In recent years, pre-trained language models (PLMs) have shown remarkable advancements in the extractive summarization task across diverse domains. However, there remains a lack of research specifically ... [more ▼]

In recent years, pre-trained language models (PLMs) have shown remarkable advancements in the extractive summarization task across diverse domains. However, there remains a lack of research specifically in the historical domain. In this paper, we propose a novel method for extractive historical single-document summarization that leverages the potential of a domain-aware historical bidirectional language model, pre-trained on a large-scale historical corpus. Subsequently, we fine-tune the language model specifically for the task of extractive historical single-document summarization. One major challenge for this task is the lack of annotated datasets for historical summarization. To address this issue, we construct a dataset by collecting archived historical documents from the Centre Virtuel de la Connaissance sur l’Europe (CVCE) group at the University of Luxembourg. Furthermore, to better learn the structural features of the input documents, we use a sentence position embedding mechanism that enables the model to learn the position information of sentences. The overall experimental results on our historical dataset collected from the CVCE group show that our method outperforms recent state-of-the-art methods in terms of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores. To the best of our knowledge, this is the first work on extractive historical text summarization. [less ▲]

Detailed reference viewed: 120 (1 UL)
Full Text
Peer Reviewed
See detailUM6P at SemEval-2023 Task 12: Out-Of-Distribution Generalization Method for African Languages Sentiment Analysis
El Mahdaouy, Abdelkader; Alami, Hamza; Lamsiyah, Salima UL et al

in Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023) (2023)

This paper presents our submitted system to AfriSenti SemEval-2023 Task 12: Sentiment Analysis for African Languages. The AfriSenti consists of three different tasks, covering monolingual, multilingual ... [more ▼]

This paper presents our submitted system to AfriSenti SemEval-2023 Task 12: Sentiment Analysis for African Languages. The AfriSenti consists of three different tasks, covering monolingual, multilingual, and zero-shot sentiment analysis scenarios for African languages. To improve model generalization, we have explored the following steps: 1) further pre-training of the AfroXLM Pre-trained Language Model (PLM), 2) combining AfroXLM and MARBERT PLMs using a residual layer, and 3) studying the impact of metric learning and two out-of-distribution generalization training objectives. The overall evaluation results show that our system has achieved promising results on several sub-tasks of Task A. For Tasks B and C, our system is ranked among the top six participating systems. [less ▲]

Detailed reference viewed: 103 (0 UL)
Full Text
Peer Reviewed
See detailCan Anaphora Resolution Improve Extractive Query-Focused Multi-Document Summarization?
Lamsiyah, Salima UL; El Mahdaouy, Abdelkader; Schommer, Christoph UL

in IEEE Access (2023)

Query-Focused Multi-Document Summarization (QF-MDS) is the task of automatically generating a summary from a collection of documents that answers a specific user's query. Extractive methods are primarily ... [more ▼]

Query-Focused Multi-Document Summarization (QF-MDS) is the task of automatically generating a summary from a collection of documents that answers a specific user's query. Extractive methods are primarily based on identifying, selecting, and ranking sentences according to their relevance to the given query. These methods have shown promising results; however, they may yield incoherent summaries when pronominal anaphoric expressions appear unbound. To address this issue, this paper proposes a novel method that leverages both contextual embeddings and anaphora resolution methods. More specifically, the Sentence-BERT (SBERT) model is employed to generate contextual embeddings for the sentences in the documents and the user's query. Additionally, the SpanBERT model is utilized to resolve unbound pronominal references in the input sentences of the documents, aiming to improve the cohesiveness of the generated summaries. We have conducted a comprehensive comparative analysis using quantitative and qualitative evaluations against other state-of-the-art systems on the standard DUC'2005 and DUC'2007 datasets. The results obtained show that the proposed method is competitive and outperforms recent query-focused multi-document summarization systems on certain ROUGE evaluation measures. Furthermore, human evaluation results further confirm that our method is able to generate more informative, cohesive, and less redundant summaries. [less ▲]

Detailed reference viewed: 139 (1 UL)
Full Text
Peer Reviewed
See detailUL \& UM6P at SemEval-2023 Task 10: Semi-Supervised Multi-task Learning for Explainable Detection of Online Sexism
Lamsiyah, Salima UL; El Mahdaouy, Abdelkader; Alami, Hamza et al

in Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023) (2023)

This paper introduces our participating system to the Explainable Detection of Online Sexism (EDOS) SemEval-2023 - Task 10: Explainable Detection of Online Sexism. The EDOS shared task covers three ... [more ▼]

This paper introduces our participating system to the Explainable Detection of Online Sexism (EDOS) SemEval-2023 - Task 10: Explainable Detection of Online Sexism. The EDOS shared task covers three hierarchical sub-tasks for sexism detection, coarse-grained and fine-grained categorization. We have investigated both single-task and multi-task learning based on RoBERTa transformer-based language models. For improving the results, we have performed further pre-training of RoBERTa on the provided unlabeled data. Besides, we have employed a small sample of the unlabeled data for semi-supervised learning using the minimum class-confusion loss. Our system has achieved macro F1 scores of 82.25\textbackslash\%, 67.35\textbackslash\%, and 49.8\textbackslash\% on Tasks A, B, and C, respectively. [less ▲]

Detailed reference viewed: 487 (1 UL)
Full Text
Peer Reviewed
See detailA Comparative Study of Sentence Embeddings for Unsupervised Extractive Multi-document Summarization
Lamsiyah, Salima UL; Schommer, Christoph UL

in Artificial Intelligence and Machine Learning (2023)

Obtaining large-scale and high-quality training data for multi-document summarization (MDS) tasks is time-consuming and resource-intensive, hence, supervised models can only be applied to limited domains ... [more ▼]

Obtaining large-scale and high-quality training data for multi-document summarization (MDS) tasks is time-consuming and resource-intensive, hence, supervised models can only be applied to limited domains and languages. In this paper, we introduce unsupervised extractive methods for both generic and query-focused MDS tasks, intending to produce a relevant summary from a collection of documents without using labeled training data or domain knowledge. More specifically, we leverage the potential of transfer learning from recent sentence embedding models to encode the input documents into rich semantic representations. Moreover, we use a coreference resolution system to resolve the broken pronominal coreference expressions in the generated summaries, aiming to improve their cohesion and textual quality. Furthermore, we provide a comparative analysis of several existing sentence embedding models in the context of unsupervised extractive multi-document summarization. Experiments on the standard DUC'2004-2007 datasets demonstrate that the proposed methods are competitive with previous unsupervised methods and are even comparable to recent supervised deep learning-based methods. The empirical results also show that the SimCSE embedding model, based on contrastive learning, achieves substantial improvements over strong sentence embedding models. Finally, the newly involved coreference resolution method is proven to bring a noticeable improvement to the unsupervised extractive MDS task. [less ▲]

Detailed reference viewed: 124 (0 UL)