Article (Scientific journals)
A Hidden Markov Model to detect relevance in nancial documents based on on/off topics
Kampas, Dimitrios; Schommer, Christoph; Sorger, Ulrich
2014In European Conference on Data Analysis
Peer reviewed
 

Files


Full Text
Abstract_ECDA2014_Kampas_Schommer.pdf
Publisher postprint (100.36 kB)
Request a copy

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
RELEVANCE; HIDDEN MARKOV MODELS; FINANCIAL NEWS
Abstract :
[en] Automated text classification has gained a significant attention since a vast amount of documents in digital forms are widespread and continuously increasing. Most of the standard classification posit the independence of the terms-features in document, which is unrealistic considering the sophisticated structure of the language. Our research concerns the discovery of relevance in documents, which adequately refers to a sufficient number of thematic themes (or topics) that are either `on' or `off'. `On topics' are semantically close with a domain specific discourse, whereas `Off topics' are not considered to be on documents. As a rather promising approach, we have modelled a stochastic process for term sequences, where each term is conditionally dependent of its preceeding terms. Hidden Markov Models hereby provide a reliable potential to incorporate language and domain dependencies for a classification. Terms are deterministically associated with classes to improve the probability estimates for the infrequent words. In the paper presentation, we demonstrate our approach and motivate its eligibility by the exploration of annotated Thomson Reuters news documents; in particular, the `on topic' documents discourse the monetary policy of Federal Reserves. We estimate the transition and emission probabilities of our model on a training set of both on and off topic documents and evaluate the accuracy of our approach using 10-fold cross validation. This work is part of the interdisciplinary research project ESCAPE, which is funded by the Fonds National de la Recherche. We kindly thank our colleagues from the Dept. of Finance for their support.
Disciplines :
Computer science
Author, co-author :
Kampas, Dimitrios ;  University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Schommer, Christoph  ;  University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Sorger, Ulrich ;  University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Language :
English
Title :
A Hidden Markov Model to detect relevance in nancial documents based on on/off topics
Publication date :
2014
Journal title :
European Conference on Data Analysis
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 11 September 2014

Statistics


Number of views
305 (58 by Unilu)
Number of downloads
3 (3 by Unilu)

Bibliography


Similar publications



Contact ORBilu