![]() Fahmy, Hazem ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (in press) When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs ... [more ▼] When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs processing images, engineers visually inspect all failure-inducing images to determine common characteristics among them. Such characteristics correspond to hazard-triggering events (e.g., low illumination) that are essential inputs for safety analysis. Though informative, such activity is expensive and error-prone. To support such safety analysis practices, we propose SEDE, a technique that generates readable descriptions for commonalities in failure-inducing, real-world images and improves the DNN through effective retraining. SEDE leverages the availability of simulators, which are commonly used for cyber-physical systems. It relies on genetic algorithms to drive simulators towards the generation of images that are similar to failure-inducing, real-world images in the test set; it then employs rule learning algorithms to derive expressions that capture commonalities in terms of simulator parameter values. The derived expressions are then used to generate additional images to retrain and improve the DNN. With DNNs performing in-car sensing tasks, SEDE successfully characterized hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled retraining leading to significant improvements in DNN accuracy, up to 18 percentage points. [less ▲] Detailed reference viewed: 83 (9 UL)![]() Lee, Jaekwon ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2023) Weakly hard real-time systems can, to some degree, tolerate deadline misses, but their schedulability still needs to be analyzed to ensure their quality of service. Such analysis usually occurs at early ... [more ▼] Weakly hard real-time systems can, to some degree, tolerate deadline misses, but their schedulability still needs to be analyzed to ensure their quality of service. Such analysis usually occurs at early design stages to provide implementation guidelines to engineers so that they can make better design decisions. Estimating worst-case execution times (WCET) is a key input to schedulability analysis. However, early on during system design, estimating WCET values is challenging and engineers usually determine them as plausible ranges based on their domain knowledge. Our approach aims at finding restricted, safe WCET sub-ranges given a set of ranges initially estimated by experts in the context of weakly hard real-time systems. To this end, we leverage (1) multi-objective search aiming at maximizing the violation of weakly hard constraints in order to find worst-case scheduling scenarios and (2) polynomial logistic regression to infer safe WCET ranges with a probabilistic interpretation. We evaluated our approach by applying it to an industrial system in the satellite domain and several realistic synthetic systems. The results indicate that our approach significantly outperforms a baseline relying on random search without learning, and estimates safe WCET ranges with a high degree of confidence in practical time (< 23h). [less ▲] Detailed reference viewed: 38 (1 UL)![]() ; Li, Yinghua ![]() in ACM Transactions on Software Engineering and Methodology (2023) Detailed reference viewed: 1161 (2 UL)![]() Belgacem, Hichem ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2023), 32(2), 471-4740 Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre ... [more ▼] Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options. In this paper, we propose LAFF, a learning-based automated approach for filling categorical fields in data entry forms. LAFF first builds Bayesian Network models by learning field dependencies from a set of historical input instances, representing the values of the fields that have been filled in the past. To improve its learning ability, LAFF uses local modeling to effectively mine the local dependencies of fields in a cluster of input instances. During the form filling phase, LAFF uses such models to predict possible values of a target field, based on the values in the already-filled fields of the form and their dependencies; the predicted values (endorsed based on field dependencies and prediction confidence) are then provided to the end-user as a list of suggestions. We evaluated LAFF by assessing its effectiveness and efficiency in form filling on two datasets, one of them proprietary from the banking domain. Experimental results show that LAFF is able to provide accurate suggestions with a Mean Reciprocal Rank value above 0.73. Furthermore, LAFF is efficient, requiring at most 317 ms per suggestion. [less ▲] Detailed reference viewed: 183 (34 UL)![]() ; ; et al in ACM Transactions on Software Engineering and Methodology (2023) Detailed reference viewed: 53 (1 UL)![]() ; ; et al in ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 43 (3 UL)![]() Ngo, Chanh Duc ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022), 31(4), 61 Apps’ pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs ... [more ▼] Apps’ pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs and are unlikely to achieve more than 50% method coverage. In this paper, we propose a strategy to achieve significantly higher coverage of the code affected by updates with a much smaller number of test inputs, thus alleviating the test oracle problem. More specifically, we present ATUA, a model-based approach that synthesizes App models with static analysis, integrates a dynamically-refined state abstraction function, and combines complementary testing strategies, including (1) coverage of the model structure, (2) coverage of the App code, (3) random exploration, and (4) coverage of dependencies identified through information retrieval. Its model-based strategy enables ATUA to generate a small set of inputs that exercise only the code affected by the updates. In turn, this makes common test oracle solutions more cost-effective as they tend to involve human effort. A large empirical evaluation, conducted with 72 App versions belonging to nine popular Android Apps, has shown that ATUA is more effective and less effort-intensive than state-of-the-art approaches when testingApp updates. [less ▲] Detailed reference viewed: 108 (23 UL)![]() Attaoui, Mohammed Oualid ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Deep neural networks (DNNs) have demonstrated superior performance over classical machine learning to support many features in safety-critical systems. Although DNNs are now widely used in such systems (e ... [more ▼] Deep neural networks (DNNs) have demonstrated superior performance over classical machine learning to support many features in safety-critical systems. Although DNNs are now widely used in such systems (e.g., self driving cars), there is limited progress regarding automated support for functional safety analysis in DNN-based systems. For example, the identification of root causes of errors, to enable both risk analysis and DNN retraining, remains an open problem. In this paper, we propose SAFE, a black-box approach to automatically characterize the root causes of DNN errors. SAFE relies on a transfer learning model pre-trained on ImageNet to extract the features from error-inducing images. It then applies a density-based clustering algorithm to detect arbitrary shaped clusters of images modeling plausible causes of error. Last, clusters are used to effectively retrain and improve the DNN. The black-box nature of SAFE is motivated by our objective not to require changes or even access to the DNN internals to facilitate adoption. Experimental results show the superior ability of SAFE in identifying different root causes of DNN errors based on case studies in the automotive domain. It also yields significant improvements in DNN accuracy after retraining, while saving significant execution time and memory when compared to alternatives. [less ▲] Detailed reference viewed: 162 (14 UL)![]() Lee, Jaekwon ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure ... [more ▼] Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure that task execution always complete before their specified deadlines. However, in practice, engineers often cannot provide precise point WCET estimates and prefer to provide plausible WCET ranges. Given a set of real-time tasks with such ranges, we provide an automated technique to determine for what WCET values the system is likely to meet its deadlines, and hence operate safely with a probabilistic guarantee. Our approach combines a search algorithm for generating worst-case scheduling scenarios with polynomial logistic regression for inferring probabilistic safe WCET ranges. We evaluated our approach by applying it to three industrial systems from different domains and several synthetic systems. Our approach efficiently and accurately estimates probabilistic safe WCET ranges within which deadlines are likely to be satisfied with a high degree of confidence. [less ▲] Detailed reference viewed: 143 (20 UL)![]() Khanfir, Ahmed ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 55 (2 UL)![]() Ojdanic, Milos ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Context: When software evolves, opportunities for introducing faults appear. Therefore, it is important to test the evolved program behaviors during each evolution cycle. However, while software evolves ... [more ▼] Context: When software evolves, opportunities for introducing faults appear. Therefore, it is important to test the evolved program behaviors during each evolution cycle. However, while software evolves, its complexity is also evolving, introducing challenges to the testing process. To deal with this issue, testing techniques should be adapted to target the effect of the program changes instead of the entire program functionality. To this end, commit-aware mutation testing, a powerful testing technique, has been proposed. Unfortunately, commit-aware mutation testing is challenging due to the complex program semantics involved. Hence, it is pertinent to understand the characteristics, predictability, and potential of the technique. Objective: We conduct an exploratory study to investigate the properties of commit-relevant mutants, i.e., the test elements of commit-aware mutation testing, by proposing a general definition and an experimental approach to identify them. We thus, aim at investigating the prevalence, location, and comparative advantages of commit-aware mutation testing over time (i.e., the program evolution). We also investigate the predictive power of several commit-related features in identifying and selecting commit-relevant mutants to understand the essential properties for its best-effort application case. Method: Our commit-relevant definition relies on the notion of observational slicing, approximated by higher-order mutation. Specifically, our approach utilizes the impact of mutants, effects of one mutant on another in capturing and analyzing the implicit interactions between the changed and unchanged code parts. The study analyses millions of mutants (over 10 million), 288 commits, five (5) different open-source software projects involving over 68,213 CPU days of computation and sets a ground truth where we perform our analysis. Results: Our analysis shows that commit-relevant mutants are located mainly outside of program commit change (81%), suggesting a limitation in previous work. We also note that effective selection of commit-relevant mutants has the potential of reducing the number of mutants by up to 93%. In addition, we demonstrate that commit relevant mutation testing is significantly more effective and efficient than state-of-the-art baselines, i.e., random mutant selection and analysis of only mutants within the program change. In our analysis of the predictive power of mutants and commit-related features (e.g., number of mutants within a change, mutant type, and commit size) in predicting commit-relevant mutants, we found that most proxy features do not reliably predict commit-relevant mutants. Conclusion: This empirical study highlights the properties of commit-relevant mutants and demonstrates the importance of identifying and selecting commit-relevant mutants when testing evolving software systems. [less ▲] Detailed reference viewed: 56 (1 UL)![]() Tian, Haoye ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 98 (48 UL)![]() Tian, Haoye ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 32 (6 UL)![]() Hu, Qiang ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution ... [more ▼] Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution shift in a new environment for deployment. However, it is labor-intensive to manually label all the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, DNNs will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: 1) using different retraining processes; 2) ignoring data distribution shifts; 3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose a novel distribution-aware test (DAT) selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to 5 times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively. [less ▲] Detailed reference viewed: 293 (62 UL)![]() ; ; et al in ACM Transactions on Software Engineering and Methodology (2021), 30(3), 1-38 Detailed reference viewed: 49 (0 UL)![]() ; ; Bissyande, Tegawendé François D Assise ![]() in ACM Transactions on Software Engineering and Methodology (2021), 30(3), 1-36 Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static ... [more ▼] Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static analysis tools for Android are challenged by the presence of reflective calls which they usually ignore. Thus, the results of their security analysis, e.g., for private data leaks, are incomplete, given the measures taken by malware writers to elude static detection. We propose a new instrumentation-based approach to address this issue in a non-invasive way. Specifically, we introduce to the community a prototype tool called DroidRA, which reduces the resolution of reflective calls to a composite constant propagation problem and then leverages the COAL solver to infer the values of reflection targets. After that, it automatically instruments the app to replace reflective calls with their corresponding Java calls in a traditional paradigm. Our approach augments an app so that it can be more effectively statically analyzable, including by such static analyzers that are not reflection-aware. We evaluate DroidRA on benchmark apps as well as on real-world apps, and demonstrate that it can indeed infer the target values of reflective calls and subsequently allow state-of-the-art tools to provide more sound and complete analysis results. [less ▲] Detailed reference viewed: 68 (2 UL)![]() ; ; et al in ACM Transactions on Software Engineering and Methodology (2021), 30(3), 1-37 Detailed reference viewed: 106 (11 UL)![]() Keller, Patrick ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2021) Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for sourcecode, which builds on similar NLP methods. The overall objective is then ... [more ▼] Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for sourcecode, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximumof program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstractsyntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robustor non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visualpatterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. Wepropose theWySiWiM(“What You See Is What It Means”) approach where visual representations of source code are fed into powerfulpre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transferlearning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variationsof the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (amulti-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, ourWySiWiMapproach performs as effectively as state of the art approaches such as ASTNN or TBCNN. We also showed with datafrom NVD and SARD thatWySiWiMrepresentation can be used to learn a vulnerable code detector with reasonable performance(accuracy∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or theclassification algorithm, to eventually discuss the promises and limitations of this research direction. [less ▲] Detailed reference viewed: 149 (15 UL)![]() Ma, Wei ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2021), 30(2), 131--1322 Detailed reference viewed: 444 (46 UL)![]() Titcheu Chekam, Thierry ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2021), 30(2), 191--1923 Detailed reference viewed: 331 (18 UL) |
||