| Reference : Large-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross ... |
| Scientific congresses, symposiums and conference proceedings : Paper published in a book | |||
| Engineering, computing & technology : Computer science | |||
| http://hdl.handle.net/10993/18024 | |||
| Large-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality | |
| English | |
Allix, Kevin [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > > ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)] | |
Bissyande, Tegawendé François D Assise [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >] | |
Jerome, Quentin [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) >] | |
Klein, Jacques [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)] | |
State, Radu [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) >] | |
Le Traon, Yves [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)] | |
| Mar-2014 | |
| Proceedings of the 4th ACM Conference on Data and Application Security and Privacy | |
| ACM | |
| CODASPY '14 | |
| 163--166 | |
| Yes | |
| 978-1-4503-2278-2 | |
| New York, NY, USA | |
| 4th ACM Conference on Data and Application Security and Privacy | |
| from 03-03-2014 to 05-03-2014 | |
| San Antonio, Texas | |
| USA | |
| [en] android ; machine learning ; malware ; ten-fold | |
| [en] To address the issue of malware detection, researchers have
recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results were recorded in the literature, many approaches being assessed with the common “10-Fold cross validation” scheme. This paper revisits the purpose of mal- ware detection to discuss the adequacy of the “10-Fold” scheme for validating techniques that may not perform well in real- ity. To this end, we have devised several Machine Learning classifiers that rely on a novel set of features built from ap- plications’ CFGs. We use a sizeable dataset of over 50,000 Android applications collected from sources where state-of- the art approaches have selected their data. We show that our approach outperforms existing machine learning-based approaches. However, this high performance on usual-size datasets does not translate in high performance in the wild. | |
| http://hdl.handle.net/10993/18024 | |
| 10.1145/2557547.2557587 | |
| http://doi.acm.org/10.1145/2557547.2557587 |
| File(s) associated to this reference | ||||||||||||||
|
Fulltext file(s):
| ||||||||||||||
All documents in ORBilu are protected by a user license.