![]() Samhi, Jordan ![]() ![]() ![]() in 44th International Conference on Software Engineering (ICSE 2022) (2022, May 21) Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art ... [more ▼] Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are ``unreachable'' in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code. [less ▲] Detailed reference viewed: 135 (21 UL)![]() Kong, Pingfan ![]() ![]() in Automated Software Engineering (2021) Android framework-specific app crashes are hard to debug. Indeed, the callback-based event-driven mechanism of Android challenges crash localization techniques that are developed for traditional Java ... [more ▼] Android framework-specific app crashes are hard to debug. Indeed, the callback-based event-driven mechanism of Android challenges crash localization techniques that are developed for traditional Java programs. The key challenge stems from the fact that the buggy code location may not even be listed within the stack trace. For example, our empirical study on 500 framework-specific crashes from an open benchmark has revealed that 37 percent of the crash types are related to bugs that are outside the stack traces. Moreover, Android programs are a mixture of code and extra-code artifacts such as the Manifest file. The fact that any artifact can lead to failures in the app execution creates the need to position the localization target beyond the code realm. In this paper, we propose Anchor, a two-phase suspicious bug location suggestion tool. Anchor specializes in finding crash-inducing bugs outside the stack trace. Anchor is lightweight and source code independent since it only requires the crash message and the apk file to locate the fault. Experimental results, collected via cross-validation and in-the- wild dataset evaluation, show that Anchor is effective in locating Android framework-specific crashing faults. [less ▲] Detailed reference viewed: 73 (14 UL)![]() Gao, Jun ![]() Doctoral thesis (2021) Direct inter-app code invocation in Android apps and its evolution: The Android ecosystem offers different facilities to enable communication among app components and across apps to ensure that rich ... [more ▼] Direct inter-app code invocation in Android apps and its evolution: The Android ecosystem offers different facilities to enable communication among app components and across apps to ensure that rich services can be composed through functionality reuse. At the heart of this system is the Inter-component communication (ICC) scheme, which has been largely studied in the literature. Less known in the community is another powerful mechanism that allows for direct inter-app code invocation which opens up for different reuse scenarios, both legitimate or malicious. In this dissertation, we expose the general workflow for this mechanism, which beyond ICCs, enables app developers to access and invoke functionalities (either entire Java classes, methods or object fields) implemented in other apps using official Android APIs. We experimentally showcase how this reuse mechanism can be leveraged to “plagiarize" supposedly-protected functionalities. Typically, we could leverage this mechanism to bypass security guards that a popular video broadcaster has placed for preventing access to its video database from outside its provided app. We further contribute with a static analysis toolkit, named DICIDer, for detecting direct inter-app code invocations in apps. An empirical analysis of the usage prevalence and evolution of this reuse mechanism is then conducted. [less ▲] Detailed reference viewed: 211 (17 UL)![]() Gao, Jun ![]() ![]() in ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020, November) {The Android ecosystem offers different facilities to enable communication among app components and across apps to ensure that rich services can be composed through functionality reuse. At the heart of ... [more ▼] {The Android ecosystem offers different facilities to enable communication among app components and across apps to ensure that rich services can be composed through functionality reuse. At the heart of this system is the Inter-component communication (ICC) scheme, which has been largely studied in the literature. Less known in the community is another powerful mechanism that allows for direct inter-app code invocation which opens up for different reuse scenarios, both legitimate or malicious. This paper exposes the general workflow for this mechanism, which beyond ICCs, enables app developers to access and invoke functionalities (either entire Java classes, methods or object fields) implemented in other apps using official Android APIs. We experimentally showcase how this reuse mechanism can be leveraged to â plagiarize" supposedly-protected functionalities. Typically, we were able to leverage this mechanism to bypass security guards that a popular video broadcaster has placed for preventing access to its video database from outside its provided app. We further contribute with a static analysis toolkit, named DICIDer, for detecting direct inter-app code invocations in apps. An empirical analysis of the usage prevalence of this reuse mechanism is then conducted. Finally, we discuss the usage contexts as well as the implications of this studied reuse mechanism [less ▲] Detailed reference viewed: 89 (7 UL)![]() ; Gao, Jun ![]() ![]() in The 3rd International Workshop on Advances in Mobile App Analysis (2020, September) In this work, we describe the design and implementation of a reusable tool named KnowledgeZooClient targeting the construction, as a crowd-sourced effort, of a knowledge graph for Android apps ... [more ▼] In this work, we describe the design and implementation of a reusable tool named KnowledgeZooClient targeting the construction, as a crowd-sourced effort, of a knowledge graph for Android apps. KnowledgeZooClient is made up of two modules: (1) the Metadata Extraction Module (MEM), which aims at extracting metadata from Android apps and (2) the Metadata Integration Module (MIM) for importing and integrating extracted metadata into a graph database. The usefulness of KnowledgeZooClient is demonstrated via an exclusive knowledge graph called KnowledgeZoo, which contains information on over 500,000 apps already and still keeps growing. Interested users can already benefit from KnowledgeZoo by writing advanced search queries so as to collect targeted app samples. [less ▲] Detailed reference viewed: 100 (9 UL)![]() ; Gao, Jun ![]() ![]() in Empirical Software Engineering (2020), 24(118), 1-41 Because of functionality evolution, or security and performance-related changes, some APIs eventually become unnecessary in a software system and thus need to be cleaned to ensure proper maintainability ... [more ▼] Because of functionality evolution, or security and performance-related changes, some APIs eventually become unnecessary in a software system and thus need to be cleaned to ensure proper maintainability. Those APIs are typically marked first as deprecated APIs and, as recommended, follow through a deprecated-replace-remove cycle, giving an opportunity to client application developers to smoothly adapt their code in next updates. Such a mechanism is adopted in the Android framework development where thousands of reusable APIs are made available to Android app developers. In this work, we present a research-based prototype tool called CDA and apply it to different revisions (i.e., releases or tags) of the Android framework code for characterising deprecated APIs. Based on the data mined by CDA, we then perform an empirical study on API deprecation in the Android ecosystem and the associated challenges for maintaining quality apps. In particular, we investigate the prevalence of deprecated APIs, their annotations and documentation, their removal and consequences, their replacement messages, developer reactions to API deprecation, as well as the evolution of the usage of deprecated APIs. Experimental results reveal several findings that further provide promising insights related to deprecated Android APIs. Notably, by mining the source code of the Android framework base, we have identified three bugs related to deprecated APIs. These bugs have been quickly assigned and positively appreciated by the framework maintainers, who claim that these issues will be updated in future releases. [less ▲] Detailed reference viewed: 136 (2 UL)![]() Gao, Jun ![]() ![]() in IEEE Transactions on Reliability (2020) The Android ecosystem today is a growing universe of a few billion devices, hundreds of millions of users and millions of applications targeting a wide range of activities where sensitive information is ... [more ▼] The Android ecosystem today is a growing universe of a few billion devices, hundreds of millions of users and millions of applications targeting a wide range of activities where sensitive information is collected and processed. Security of communication and privacy of data are thus of utmost importance in application development. Yet, regularly, there are reports of successful attacks targeting Android users. While some of those attacks exploit vulnerabilities in the Android OS, others directly concern application-level code written by a large pool of developers with varying experience. Recently, a number of studies have investigated this phenomenon, focusing however only on a specific vulnerability type appearing in apps, and based on only a snapshot of the situation at a given time. Thus, the community is still lacking comprehensive studies exploring how vulnerabilities have evolved over time, and how they evolve in a single app across developer updates. Our work fills this gap by leveraging a data stream of 5 million app packages to re-construct versioned lineages of Android apps and finally obtained 28;564 app lineages (i.e., successive releases of the same Android apps) with more than 10 app versions each, corresponding to a total of 465;037 apks. Based on these app lineages, we apply state-of- the-art vulnerability-finding tools and investigate systematically the reports produced by each tool. In particular, we study which types of vulnerabilities are found, how they are introduced in the app code, where they are located, and whether they foreshadow malware. We provide insights based on the quantitative data as reported by the tools, but we further discuss the potential false positives. Our findings and study artifacts constitute a tangible knowledge to the community. It could be leveraged by developers to focus verification tasks, and by researchers to drive vulnerability discovery and repair research efforts. [less ▲] Detailed reference viewed: 248 (19 UL)![]() Gao, Jun ![]() ![]() in 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering (2019, February 24) Empirical validations of research approaches eventually require a curated ground truth. In studies related to Android malware, such a ground truth is built by leveraging Anti-Virus (AV) scanning reports ... [more ▼] Empirical validations of research approaches eventually require a curated ground truth. In studies related to Android malware, such a ground truth is built by leveraging Anti-Virus (AV) scanning reports which are often provided free through online services such as VirusTotal. Unfortunately, these reports do not offer precise information for appropriately and uniquely assigning classes to samples in app datasets: AV engines indeed do not have a consensus on specifying information in labels. Furthermore, labels often mix information related to families, types, etc. In particular, the notion of “adware” is currently blurry when it comes to maliciousness. There is thus a need to thoroughly investigate cases where adware samples can actually be associated with malware (e.g., because they are tagged as adware but could be considered as malware as well). In this work, we present a large-scale analytical study of Android adware samples to quantify to what extent “adware should be considered as malware”. Our analysis is based on the Androzoo repository of 5 million apps with associated AV labels and leverages a state-of-the-art label harmonization tool to infer the malicious type of apps before confronting it against the ad families that each adware app is associated with. We found that all adware families include samples that are actually known to implement specific malicious behavior types. Up to 50% of samples in an ad family could be flagged as malicious. Overall the study demonstrates that adware is not necessarily benign. [less ▲] Detailed reference viewed: 224 (19 UL)![]() Gao, Jun ![]() ![]() in Proceedings of 2019 24th International Conference on Engineering of Complex Computer Systems (2019) Android developers are known to frequently update their apps for fixing bugs and addressing vulnerabilities, but more commonly for introducing new features. This process leads a trail in the ecosystem ... [more ▼] Android developers are known to frequently update their apps for fixing bugs and addressing vulnerabilities, but more commonly for introducing new features. This process leads a trail in the ecosystem with multiple successive app versions which record historical evolutions of a variety of apps. While the literature includes various works related to such evolutions, little attention has been paid to the research question on how quality evolves, in particular with regards to maintainability and code complexity. In this work, we fill this gap by presenting a largescale empirical study: we leverage the AndroZoo dataset to obtain a significant number of app lineages (i.e., successive releases of the same Android apps), and rely on six well-established, maintainability-related complexity metrics commonly accepted in the literature on app quality, maintainability etc. Our empirical investigation eventually reveals that, overall, while Android apps become bigger in terms of code size as time goes by, the apps themselves appear to be increasingly maintainable and thus decreasingly complex [less ▲] Detailed reference viewed: 68 (14 UL)![]() Gao, Jun ![]() ![]() in Proceedings of the 16th International Conference on Mining Software Repositories (2019) Android app developers recurrently use crypto-APIs to provide data security to app users. Unfortunately, misuse of APIs only creates an illusion of security and even exposes apps to systematic attacks. It ... [more ▼] Android app developers recurrently use crypto-APIs to provide data security to app users. Unfortunately, misuse of APIs only creates an illusion of security and even exposes apps to systematic attacks. It is thus necessary to provide developers with a statically-enforceable list of specifications of crypto-API usage rules. On the one hand, such rules cannot be manually written as the process does not scale to all available APIs. On the other hand, a classical mining approach based on common usage patterns is not relevant in Android, given that a large share of usages include mistakes. In this work, building on the assumption that “developers update API usage instances to fix misuses”, we propose to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules. Eventually, our investigations yield negative results on our assumption that API usage updates tend to correct misuses. Actually, it appears that updates that fix misuses may be unintentional: the same misuses patterns are quickly re-introduced by subsequent updates. [less ▲] Detailed reference viewed: 118 (10 UL)![]() Kong, Pingfan ![]() ![]() in IEEE Transactions on Reliability (2018) Automated testing of Android apps is essential for app users, app developers and market maintainer communities alike. Given the widespread adoption of Android and the specificities of its development ... [more ▼] Automated testing of Android apps is essential for app users, app developers and market maintainer communities alike. Given the widespread adoption of Android and the specificities of its development model, the literature has proposed various testing approaches for ensuring that not only functional requirements but also non-functional requirements are satisfied. In this paper, we aim at providing a clear overview of the state-of-the-art works around the topic of Android app testing, in an attempt to highlight the main trends, pinpoint the main methodologies applied and enumerate the challenges faced by the Android testing approaches as well as the directions where the community effort is still needed. To this end, we conduct a Systematic Literature Review (SLR) during which we eventually identified 103 relevant research papers published in leading conferences and journals until 2016. Our thorough examination of the relevant literature has led to several findings and highlighted the challenges that Android testing researchers should strive to address in the future. After that, we further propose a few concrete research directions where testing approaches are needed to solve recurrent issues in app updates, continuous increases of app sizes, as well as the Android ecosystem fragmentation. [less ▲] Detailed reference viewed: 251 (33 UL)![]() ; Gao, Jun ![]() ![]() in 15th International Conference on Mining Software Repositories (MSR 2018) (2018, May) Detailed reference viewed: 202 (10 UL)![]() Gao, Jun ![]() Poster (2018) Detailed reference viewed: 143 (25 UL) |
||