Looking for your paper? The papers are sorted alphabetically by title within each track, which are also sorted alphabetically. Use the links below to jump to your specific track.
Technical Research
Software Engineering in Practice (SEIP)
Software Engineering Education and Training (SEET)
New Ideas and Emerging Results (NIER)
Formal Demonstrations
Doctoral Symposium
Posters
ACM Student Research Competition
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
Martin Monperrus
University of Lille, France; INRIA, France
At ICSE'2013, there was the first session ever dedicated to automatic program repair. In this session, Kim et al. presented PAR, a novel template-based approach for fixing Java bugs. We strongly disagree with key points of this paper. Our critical review has two goals. First, we aim at explaining why we disagree with Kim and colleagues and why the reasons behind this disagreement are important for research on automatic software repair in general. Second, we aim at contributing to the field with a clarification of the essential ideas behind automatic software repair. In particular we discuss the main evaluation criteria of automatic software repair: understandability, correctness and completeness. We show that depending on how one sets up the repair scenario, the evaluation goals may be contradictory. Eventually, we discuss the nature of fix acceptability and its relation to the notion of software correctness.
Preprint Available
A Study and Toolkit for Asynchronous Programming in C#
Semih Okur, David L. Hartveld, Danny Dig, and Arie van Deursen
University of Illinois at Urbana-Champaign, USA; Delft University of Technology, Netherlands; Oregon State University, USA
ACM Distinguished Paper
Asynchronous programming is in demand today, because responsiveness is increasingly important on all modern devices. Yet, we know little about how developers use asynchronous programming in practice. Without such knowledge, developers, researchers, language and library designers, and tool providers can make wrong assumptions. We present the first study that analyzes the usage of asynchronous programming in a large experiment. We analyzed 1378 open source Windows Phone (WP) apps, comprising 12M SLOC, produced by 3376 developers. Using this data, we answer 2 research questions about use and misuse of asynchronous constructs. Inspired by these findings, we developed (i) Asyncifier, an automated refactoring tool that converts callback-based asynchronous code to use async/await; (ii) Corrector, a tool that finds and corrects common misuses of async/await. Our empirical evaluation shows that these tools are (i) applicable and (ii) efficient. Developers accepted 314 patches generated by our tools.
Preprint Available
Additional Information
A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence
Xiangjuan Yao, Mark Harman, and Yue Jia
China University of Mining and Technology, China; University College London, UK
Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants.
Preprint Available
Additional Information
APE: An Annotation Language and Middleware for Energy-Efficient Mobile Application Development
Nima Nikzad, Octav Chipara, and William G. Griswold
University of California at San Diego, USA; University of Iowa, USA
Energy-efficiency is a key concern in continuously-running mobile applications, such as those for health and context monitoring. Unfortunately, developers must implement complex and customized power-management policies for each application. This involves the use of complex primitives and writing error-prone multithreaded code to monitor hardware state. To address this problem, we present APE, an annotation language and middleware service that eases the development of energy-efficient Android applications. APE annotations are used to demarcate a power-hungry code segment whose execution is deferred until the device enters a state that minimizes the cost of that operation. The execution of power-hungry operations is coordinated across applications by the APE middleware. Several examples show the expressive power of our approach. A case study of using APE annotations in a real mobile sensing application shows that annotations can cleanly specify a power management policy and reduce the complexity of its implementation. An empirical evaluation of the middleware shows that APE introduces negligible overhead and equals hand-tuned code in energy savings, in this case achieving 63.4% energy savings compared to the case when there is no coordination.
Preprint Available
AR-Miner: Mining Informative Reviews for Developers from Mobile App Marketplace
Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, and Boshen Zhang
Nanyang Technological University, Singapore; Carnegie Mellon University, USA
With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user feedback to improve user satisfaction, but suffer from the absence of effective user review analytics tools. To facilitate mobile app developers discover the most “informative” user reviews from a large and rapidly increasing pool of user reviews, we present “AR-Miner” — a novel computational framework for App Review Mining, which performs comprehensive analytics from raw user reviews by (i) first extracting informative user reviews by filtering noisy and irrelevant ones, (ii) then grouping the informative reviews automatically using topic modeling, (iii) further prioritizing the informative reviews by an effective review ranking scheme, (iv) and finally presenting the groups of most “informative” reviews via an intuitive visualization approach. We conduct extensive experiments and case studies on four popular Android apps to evaluate AR-Miner, from which the encouraging results indicate that AR-Miner is effective, efficient and promising for app developers.
Preprint Available
Achieving Accuracy and Scalability Simultaneously in Detecting Application Clones on Android Markets
Kai Chen, Peng Liu, and Yingjun Zhang
Pennsylvania State University, USA; Institute of Information Engineering at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China
Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ``purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ``centroid effect" and the inherent ``monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.
Preprint Available
Alternate Refactoring Paths Reveal Usability Problems
Mohsen Vakilian and Ralph E. Johnson
University of Illinois at Urbana-Champaign, USA
Modern Integrated Development Environments (IDEs) support many refactorings. Yet, programmers greatly underuse automated refactorings. Recent studies have applied traditional usability testing methodologies such as surveys, lab studies, and interviews to find the usability problems of refactoring tools. However, these methodologies can identify only certain kinds of usability problems. The critical incident technique (CIT) is a general methodology that uncovers usability problems by analyzing troubling user interactions. We adapt CIT to refactoring tools and show that alternate refactoring paths are indicators of the usability problems of refactoring tools. We define an alternate refactoring path as a sequence of user interactions that contains cancellations, reported messages, or repeated invocations of the refactoring tool. We evaluated our method on a large corpus of refactoring usage data, which we collected during a field study on 36 programmers over three months. This method revealed 15 usability problems, 13 of which were previously unknown. We reported these problems and proposed design improvements to Eclipse developers. The developers acknowledged all of the problems and have already fixed four of them. This result suggests that analyzing alternate paths is effective at discovering the usability problems of interactive program transformation (IPT) tools.
Preprint Available
Additional Information
An Analysis of the Relationship between Conditional Entropy and Failed Error Propagation in Software Testing
Kelly Androutsopoulos, David Clark, Haitao Dan, Robert M. Hierons, and Mark Harman
Middlesex University, UK; University College London, UK; Brunel University, UK
Failed error propagation (FEP) is known to hamper software testing, yet it remains poorly understood. We introduce an information theoretic formulation of FEP that is based on measures of conditional entropy. This formulation considers the situation in which we are interested in the potential for an incorrect program state at statement s to fail to propagate to incorrect output. We define five metrics that differ in two ways: whether we only consider parts of the program that can be reached after executing s and whether we restrict attention to a single program path of interest .We give the results of experiments in which it was found that on average one in 10 tests suffered from FEP, earlier studies having shown that this figure can vary significantly between programs. The experiments also showed that our metrics are well-correlated with FEP. Our empirical study involved 30 programs, for which we executed a total of 7,140,000 test cases. The results reveal that the metrics differ in their performance but the Spearman rank correlation with failed error propagation is close to 0.95 for two of the metrics. These strong correlations in an experimental setting, in which all information about both FEP and conditional entropy is known, open up the possibility in the longer term of devising inexpensive information theory based metrics that allow us to minimise the effect of FEP.
An Exploratory Study of the Pull-Based Software Development Model
Georgios Gousios, Martin Pinzger, and Arie van Deursen
Delft University of Technology, Netherlands; University of Klagenfurt, Austria
The advent of distributed version control systems has led to the development of a new paradigm for distributed software development; instead of pushing changes to a central repository, developers pull them from other repositories and merge them locally. Various code hosting sites, notably Github, have tapped on the opportunity to facilitate pull-based development by offering workflow support tools, such as code reviewing systems and integrated issue trackers. In this work, we explore how pull-based software development works, first on the GHTorrent corpus and then on a carefully selected sample of 291 projects. We find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. We show that a relatively small number of factors affect both the decision to merge a pull request and the time to process it. We also examine the reasons for pull request rejection and find that technical ones are only a small minority.
Preprint Available
Analyze This! 145 Questions for Data Scientists in Software Engineering
Andrew Begel and Thomas Zimmermann
Microsoft Research, USA
In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.
Preprint Available
Additional Information
AsDroid: Detecting Stealthy Behaviors in Android Applications by User Interface and Program Behavior Contradiction
Jianjun Huang, Xiangyu Zhang, Lin Tan, Peng Wang, and Bin Liang
Purdue University, USA; University of Waterloo, Canada; Renmin University of China, China
Android smartphones are becoming increasingly popular. The open nature of Android allows users to install miscellaneous applications, including the malicious ones, from third-party marketplaces without rigorous sanity checks. A large portion of existing malwares perform stealthy operations such as sending short messages, making phone calls and HTTP connections, and installing additional malicious components. In this paper, we propose a novel technique to detect such stealthy behavior. We model stealthy behavior as the program behavior that mismatches with user interface, which denotes the user's expectation of program behavior. We use static program analysis to attribute a top level function that is usually a user interaction function with the behavior it performs. Then we analyze the text extracted from the user interface component associated with the top level function. Semantic mismatch of the two indicates stealthy behavior. To evaluate AsDroid, we download a pool of 182 apps that are potentially problematic by looking at their permissions. Among the 182 apps, AsDroid reports stealthy behaviors in 113 apps, with 28 false positives and 11 false negatives.
Preprint Available
Automated Design of Self-Adaptive Software with Control-Theoretical Formal Guarantees
Antonio Filieri, Henry Hoffmann, and Martina Maggio
University of Stuttgart, Germany; University of Chicago, USA; Lund University, Sweden
Self-adaptation enables software to execute successfully in dynamic, unpredictable, and uncertain environments. Control theory provides a broad set of mathematically grounded techniques for adapting the behavior of dynamic systems. While it has been applied to specific software control problems, it has proved difficult to define methodologies allowing non-experts to systematically apply control techniques to create adaptive software. These difficulties arise because computer systems are usually non-linear, with varying workloads and heterogeneous components, making it difficult to model software as a dynamic system; i.e., by means of differential or difference equations. This paper proposes a broad scope methodology for automatically constructing both an approximate dynamic model of a software system and a suitable controller for managing its non-functional requirements. Despite its generality, this methodology provides formal guarantees concerning the system's dynamic behavior by keeping its model continuously updated to compensate for changes in the execution environment and effects of the initial approximation. We apply the methodology to three case studies, demonstrating its generality by tackling different domains (and different non-functional requirements) with the same approach. Being broadly applicable and fully automated, this methodology may allow the adoption of control theoretical solutions (and their formal properties) for a wide range of software adaptation problems.
Preprint Available
Automated Goal Operationalisation Based on Interpolation and SAT Solving
Renzo Degiovanni, Dalal Alrajeh, Nazareno Aguirre, and Sebastian Uchitel
Universidad Nacional de Río Cuarto, Argentina; Imperial College London, UK; Universidad de Buenos Aires, Argentina
Goal oriented methods have been successfully employed for eliciting and elaborating software requirements. When goals are assigned to an agent, they have to be operationalised: the agent’s operations have to be refined, by equipping them with appropriate enabling and triggering conditions, so that the goals are fulfilled. Goal operationalisation generally demands a significant effort of the engineer. Although there exist approaches that tackle this problem, they are either informal or at most semi automated, requiring the engineer to assist in the process. In this paper, we present an approach for goal operationalisation that automatically computes required preconditions and required triggering conditions for operations, so that the resulting operations establish the goals. The process is iterative, is able to deal with safety goals and particular kinds of liveness goals, and is based on the use of interpolation and SAT solving.
Preprint Available
Automated Memory Leak Detection for Production Use
Changhee Jung, Sangho Lee, Easwaran Raman, and Santosh Pande
Virginia Tech, USA; Georgia Tech, USA; Google, USA
This paper presents Sniper, an automated memory leak detection tool for C/C++ production software. To track the staleness of allocated memory (which is a clue to potential leaks) with little overhead (mostly <3%), Sniper leverages instruction sampling using performance monitoring units available in commodity processors. It also offloads the time- and space-consuming analyses, and works on the original software without modifying the underlying memory allocator; it neither perturbs the application execution nor increases the heap size. The Sniper can even deal with multithreaded applications with very low overhead. In particular, it performs a statistical analysis, which views memory leaks as anomalies, for automated and systematic leak determination. Consequently, it accurately detected real-world memory leaks with no false positive, and achieved an F-measure of 81% on average for 17 benchmarks stress-tested with various memory leaks.
Brownout: Building More Robust Cloud Applications
Cristian Klein, Martina Maggio, Karl-Erik Årzén, and Francisco Hernández-Rodriguez
Umeå University, Sweden; Lund University, Sweden
Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the cloud infrastructure level - for example hardware failures - and at the user workload level - flash crowds. However, robustly withstanding extreme variability, requires costly hardware over-provisioning. In this paper, we introduce a self-adaptation programming paradigm called brownout. Using this paradigm, applications can be designed to robustly withstand unpredictable runtime variations, without over-provisioning. The paradigm is based on optional code that can be dynamically deactivated through decisions based on control theory. We modified two popular web application prototypes - RUBiS and RUBBoS - with less than 170 lines of code, to make them brownout-compliant. Experiments show that brownout self-adaptation dramatically improves the ability to withstand flash-crowds and hardware failures.
Preprint Available
Additional Information
Building It Together: Synchronous Development in OSS
Qi Xuan and Vladimir Filkov
Zhejiang University of Technology, China; University of California at Davis, USA
In distributed software development synchronized actions are important for completion of complex, interleaved tasks that require the abilities of multiple people. Synchronous development is manifested when file commits by two developers are close together in time and modify the same files. Here we propose quantitative methods for identifying synchronized activities in OSS projects, and use them to relate developer synchronization with effective productivity and communication. In particular, we define co-commit bursts and communication bursts, as intervals of time rich in co-commit and correspondence activities, respectively, and construct from them smoothed time series which can be, subsequently, correlated to discover synchrony. We find that synchronized co-commits between developers are associated with their effective productivity and coordination: during co-commit bursts, vs. at other times, the project size grows faster even though the overall coding effort slows down. We also find strong correlation between synchronized co-commits and communication, that is, for pairs of developers, more co-commit bursts are accompanied with more communication bursts, and their relationship follows closely a linear model. In addition, synchronized co-commits and communication activities occur very close together in time, thus, they can also be thought of as synchronizing each other. This study can help with better understanding collaborative mechanisms in OSS and the role communication plays in distributed software engineering.
CARE: Cache Guided Deterministic Replay for Concurrent Java Programs
Yanyan Jiang, Tianxiao Gu, Chang Xu, Xiaoxing Ma, and Jian Lu
Nanjing University, China
Deterministic replay tools help programmers debug concurrent programs. However, for long-running programs, a replay tool may generate huge log of shared memory access dependences. In this paper, we present CARE, an application-level deterministic record and replay technique to reduce the log size. The key idea of CARE is logging read-write dependences only at per-thread value prediction cache misses. This strategy records only a subset of all exact read-write dependences, and reduces synchronizations protecting memory reads in the instrumented code. Realizing that such record strategy provides only value-deterministic replay, CARE also adopts variable grouping and action prioritization heuristics to synthesize sequentially consistent executions at replay in linear time. We implemented CARE in Java and experimentally evaluated it with recognized benchmarks. Results showed that CARE successfully resolved all missing read-write dependences, producing sequentially consistent replay for all benchmarks. CARE exhibited 1.7--40X (median 3.4X) smaller runtime overhead, and 1.1--309X (median 7.0X) smaller log size against state-of-the-art technique LEAP.
Preprint Available
Case Studies and Tools for Contract Specifications
Todd W. Schiller, Kellen Donohue, Forrest Coward, and Michael D. Ernst
University of Washington, USA
Contracts are a popular tool for specifying the functional behavior of software. This paper characterizes the contracts that developers write, the contracts that developers could write, and how a developer reacts when shown the difference. This paper makes three research contributions based on an investigation of open-source projects' use of Code Contracts. First, we characterize Code Contract usage in practice. For example, approximately three-fourths of the Code Contracts are basic checks for the presence of data. We discuss similarities and differences in usage across the projects, and we identify annotation burden, tool support, and training as possible explanations based on developer interviews. Second, based on contracts automatically inferred for four of the projects, we find that developers underutilize contracts for expressing state updates, object state indicators, and conditional properties. Third, we performed user studies to learn how developers decide which contracts to enforce. The developers used contract suggestions to support their existing use cases with more expressive contracts. However, the suggestions did not lead them to experiment with other use cases for which contracts are better-suited. In support of the research contributions, the paper presents two engineering contributions: (1) Celeriac, a tool for generating traces of .NET programs compatible with the Daikon invariant detection tool, and (2) Contract Inserter, a Visual Studio add-in for discovering and inserting likely invariants as Code Contracts.
Preprint Available
Additional Information
Characterizing and Detecting Performance Bugs for Smartphone Applications
Yepang Liu, Chang Xu, and Shing-Chi Cheung
Hong Kong University of Science and Technology, China; Nanjing University, China
ACM Distinguished Paper
Smartphone applications’ performance has a vital impact on user experience. However, many smartphone applications suffer from bugs that cause significant performance degradation, thereby losing their competitive edge. Unfortunately, people have little understanding of these performance bugs. They also lack effective techniques to fight with such bugs. To bridge this gap, we conducted a study of 70 real-world performance bugs collected from eight large-scale and popular Android applications. We studied the characteristics (e.g., bug types and how they manifested) of these bugs and identified their common patterns. These findings can support follow-up research on performance bug avoidance, testing, debugging and analysis for smartphone applications. To demonstrate the usefulness of our findings, we implemented a static code analyzer, PerfChecker, to detect our identified performance bug patterns. We experimentally evaluated PerfChecker by applying it to 29 popular Android applications, which comprise 1.1 million lines of Java code. PerfChecker successfully detected 126 matching instances of our performance bug patterns. Among them, 68 were quickly confirmed by developers as previously-unknown issues that affect application performance, and 20 were fixed soon afterwards by following our optimization suggestions.
Preprint Available
Additional Information
Checking App Behavior Against App Descriptions
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller
Saarland University, Germany
How do we know a program does what it claims to do? After clustering Android apps by their description topics, we identify outliers in each cluster with respect to their API usage. A "weather" app that sends messages thus becomes an anomaly; likewise, a "messaging" app would typically not be expected to access the current location. Applied on a set of 22,500+ Android applications, our CHABADA prototype identified several anomalies; additionally, it flagged 56% of novel malware as such, without requiring any known malware patterns.
Preprint Available
Code Coverage for Suite Evaluation by Developers
Rahul Gopinath, Carlos Jensen, and Alex Groce
Oregon State University, USA
One of the key challenges of developers testing code is determining a test suite's quality -- its ability to find faults. The most common approach is to use code coverage as a measure for test suite quality, and diminishing returns in coverage or high absolute coverage as a stopping rule. In testing research, suite quality is often evaluated by a suite's ability to kill mutants (artificially seeded potential faults). Determining which criteria best predict mutation kills is critical to practical estimation of test suite quality. Previous work has only used small sets of programs, and usually compares multiple suites for a single program. Practitioners, however, seldom compare suites --- they evaluate one suite. Using suites (both manual and automatically generated) from a large set of real-world open-source projects shows that evaluation results differ from those for suite-comparison: statement (not block, branch, or path) coverage predicts mutation kills best.
Preprint Available
CodeHint: Dynamic and Interactive Synthesis of Code Snippets
Joel Galenson, Philip Reames, Rastislav Bodik, Björn Hartmann, and Koushik Sen
University of California at Berkeley, USA
Awarded as Prof. R. Narasimhan Lecture
There are many tools that help programmers find code fragments, but most are inexpressive and rely on static information. We present a new technique for synthesizing code that is dynamic (giving accurate results and allowing programmers to reason about concrete executions), easy-to-use (supporting a wide range of correctness specifications), and interactive (allowing users to refine the candidate code snippets). Our implementation, which we call CodeHint, generates and evaluates code at runtime and hence can synthesize real-world Java code that involves I/O, reflection, native calls, and other advanced language features. We have evaluated CodeHint in two user studies and show that its algorithms are efficient and that it improves programmer productivity by more than a factor of two.
Preprint Available
Additional Information
Comparing Static Bug Finders and Statistical Prediction
Foyzur Rahman, Sameer Khatri, Earl T. Barr, and Premkumar Devanbu
University of California at Davis, USA; University College London, UK
The all-important goal of delivering better software at lower cost has led to a vital, enduring quest for ways to find and remove defects efficiently and accurately. To this end, two parallel lines of research have emerged over the last years. Static analysis seeks to find defects using algorithms that process well-defined semantic abstractions of code. Statistical defect prediction uses historical data to estimate parameters of statistical formulae modeling the phenomena thought to govern defect occurrence and predict where defects are likely to occur. These two approaches have emerged from distinct intellectual traditions and have largely evolved independently, in “splendid isolation”. In this paper, we evaluate these two (largely) disparate approaches on a similar footing. We use historical defect data to apprise the two approaches, compare them, and seek synergies. We find that under some accounting principles, they provide comparable benefits; we also find that in some settings, the performance of certain static bug-finders can be enhanced using information provided by statistical defect prediction.
Preprint Available
ConLock: A Constraint-Based Approach to Dynamic Checking on Deadlocks in Multithreaded Programs
Yan Cai, Shangru Wu, and W. K. Chan
City University of Hong Kong, China
Many predictive deadlock detection techniques analyze multithreaded programs to suggest potential deadlocks (referred to as cycles or deadlock warnings). Nonetheless, many of such cycles are false positives. On checking these cycles, existing dynamic deadlock confirmation techniques may frequently encounter thrashing or result in a low confirmation probability. This paper presents a novel technique entitled ConLock to address these problems. ConLock firstly analyzes a given cycle and the execution trace that produces the cycle. It identifies a set of thread scheduling constraints based on a novel should-happen-before relation. ConLock then manipulates a confirmation run with the aim to not violate a reduced set of scheduling constraints and to trigger an occurrence of the deadlock if the cycle is a real deadlock. If the cycle is a false positive, ConLock reports scheduling violations. We have validated ConLock using a suite of real-world programs with 11 deadlocks. The result shows that among all 741 cycles reported by Magiclock, ConLock confirms all 11 deadlocks with a probability of 71%−100%. On the remaining 730 cycles, ConLock reports scheduling violations on each. We have systematically sampled 87 out of the 730 cycles and confirmed that all these cycles are false positives.
Preprint Available
Controlled Modeling Environment using Flexibly-Formatted Spreadsheets
Hisashi Miyashita, Hideki Tai, and Shunichi Amano
Cybernet Systems, Japan; IBM Research, Japan
As modeling in software and system development becomes increasingly prevalent, many engineers need to collaboratively develop models spanning many disciplines such as requirements management, system design, software, etc. However, integrating modeling languages for various disciplines is challenging, because UML and SysML are too complex for many engineers to understand. Therefore, in complicated engineering processes, engineers with different areas of expertise often find it difficult to access the same information in different domain-specific modeling environments. Our approach to address this problem is to share and edit the models as task-oriented spreadsheets, using a unified model (in UML or SysML) and a unified user interface (in the spreadsheet program). The formats of the spreadsheets are optimized for various tasks while the target models remain in a unified modeling language. Since the transformation between the spreadsheets and the models is automated and transparent, users do not have to be skilled with the modeling languages to edit the spreadsheets. Using our novel approach, we were able to reduce the errors and time, and also the difficulty for each task without providing specialized training for the engineers. A preliminary user study showed that, by applying the spreadsheet-based approach, we could reduce the number of errors with less time for typical systems engineering tasks.
Preprint Available
Coverage Is Not Strongly Correlated with Test Suite Effectiveness
Laura Inozemtseva and Reid Holmes
University of Waterloo, Canada
ACM Distinguished Paper
The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites. We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
Preprint Available
Additional Information
Cowboys, Ankle Sprains, and Keepers of Quality: How Is Video Game Development Different from Software Development?
Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan
North Carolina State University, USA; Microsoft Research, USA
ACM Distinguished Paper
Video games make up an important part of the software industry, yet the software engineering community rarely studies video games. This imbalance is a problem if video game development differs from general software development, as some game experts suggest. In this paper we describe a study with 14 interviewees and 364 survey respondents. The study elicited substantial differences between video game development and other software development. For example, in game development, “cowboy coders” are necessary to cope with the continuous interplay between creative desires and technical constraints. Consequently, game developers are hesitant to use automated testing because of these tests’ rapid obsolescence in the face of shifting creative desires of game designers. These differences between game and non-game development have implications for research, industry, and practice. For instance, as a starting point for impacting game development, researchers could create testing tools that enable game developers to create tests that assert flexible behavior with little up-front investment.
Preprint Available
Cross-Checking Oracles from Intrinsic Software Redundancy
Antonio Carzaniga, Alberto Goffi, Alessandra Gorla, Andrea Mattavelli, and Mauro Pezzè
University of Lugano, Switzerland; Saarland University, Germany; University of Milano-Bicocca, Italy
Despite the recent advances in automatic test generation, testers must still write test oracles manually. If formal specifications are available, it might be possible to use decision procedures derived from those specifications. We present a technique that is based on a form of specification but also leverages more information from the system under test. We assume that the system under test is somewhat redundant, in the sense that some operations are designed to behave like others but their executions are different. Our experience in this and previous work indicates that this redundancy exists and is easily documented. We then generate oracles by cross-checking the execution of a test with the same test in which we replace some operations with redundant ones. We develop this notion of cross-checking oracles into a generic technique to automatically insert oracles into unit tests. An experimental evaluation shows that cross-checking oracles, used in combination with automatic test generation techniques, can be very effective in revealing faults, and that they can even improve good hand-written test suites.
Preprint Available
Data-Guided Repair of Selection Statements
Divya Gopinath, Sarfraz Khurshid, Diptikalyan Saha, and Satish Chandra
University of Texas at Austin, USA; IBM Research, India; Samsung Electronics, USA
Database-centric programs form the backbone of many enterprise systems. Fixing defects in such programs takes much human effort due to the interplay between imperative code and database-centric logic. This paper presents a novel data-driven approach for automated fixing of bugs in the selection condition of database statements (e.g., WHERE clause of SELECT statements) – a common form of bugs in such programs. Our key observation is that in real-world data, there is information latent in the distribution of data that can be useful to repair selection conditions efficiently. Given a faulty database program and input data, only a part of which induces the defect, our novelty is in determining the correct behavior for the defect-inducing data by taking advantage of the information revealed by the rest of the data. We accomplish this by employing semi-supervised learning to predict the correct behavior for defect-inducing data and by patching up any inaccuracies in the prediction by a SAT-based combinatorial search. Next, we learn a compact decision tree for the correct behavior, including the correct behavior on the defect-inducing data. This tree suggests a plausible fix to the selection condition. We demonstrate the feasibility of our approach on seven realworld examples.
Design Rule Spaces: A New Form of Architecture Insight
Lu Xiao, Yuanfang Cai, and Rick Kazman
Drexel University, USA; University of Hawaii, USA; SEI, USA
In this paper, we investigate software architecture as a set of overlapping design rule spaces, formed by one or more structural or evolutionary relationships and clustered using our design rule hierarchy algorithm. Considering evolutionary coupling as a special type of relationship, we investigated (1) whether design rule spaces can reveal structural relations among error-prone files; (2) whether design rule spaces can reveal structural problems contributing to error-proneness.We studied three large-scale open source projects and found that error-prone files can be captured by just a few design rule sub-spaces. Supported by our tool, Titan, we are able to flexibly visualize design rule spaces formed by different types of relationships, including evolutionary dependencies. This way, we are not only able to visualize which error-prone files belong to which design rule spaces, but also to visualize the structural problems that give insight into why these files are error prone. Design rule spaces provide valuable direction on which parts of the architecture are problematic, and on why, when, and how to refactor.
Detecting Differences across Multiple Instances of Code Clones
Yun Lin, Zhenchang Xing, Yinxing Xue, Yang Liu, Xin Peng, Jun Sun, and Wenyun Zhao
Fudan University, China; Nanyang Technological University, Singapore; National University of Singapore, Singapore; Singapore University of Technology and Design, Singapore
Clone detectors find similar code fragments (i.e., instances of code clones) and report large numbers of them for industrial systems. To maintain or manage code clones, developers often have to investigate differences of multiple cloned code fragments. However,existing program differencing techniques compare only two code fragments at a time. Developers then have to manually combine several pairwise differencing results. In this paper, we present an approach to automatically detecting differences across multiple clone instances. We have implemented our approach as an Eclipse plugin and evaluated its accuracy with three Java software systems. Our evaluation shows that our algorithm has precision over 97.66% and recall over 95.63% in three open source Java projects. We also conducted a user study of 18 developers to evaluate the usefulness of our approach for eight clone-related refactoring tasks. Our study shows that our approach can significantly improve developers’performance in refactoring decisions, refactoring details, and task completion time on clone-related refactoring tasks. Automatically detecting differences across multiple clone instances also opens opportunities for building practical applications of code clones in software maintenance, such as auto-generation of application skeleton, intelligent simultaneous code editing.
Preprint Available
Detecting Memory Leaks through Introspective Dynamic Behavior Modeling using Machine Learning
Sangho Lee, Changhee Jung, and Santosh Pande
Georgia Tech, USA; Virginia Tech, USA
This paper expands staleness-based memory leak detection by presenting a machine learning-based framework. The proposed framework is based on an idea that object staleness can be better leveraged in regard to similarity of objects; i.e., an object is more likely to have leaked if it shows significantly high staleness not observed from other similar objects with the same allocation context. A central part of the proposed framework is the modeling of heap objects. To this end, the framework observes the staleness of objects during a representative run of an application. From the observed data, the framework generates training examples, which also contain instances of hypothetical leaks. Via machine learning, the proposed framework replaces the error-prone user-definable staleness predicates used in previous research with a model-based prediction. The framework was tested using both synthetic and real-world examples. Evaluation with synthetic leakage workloads of SPEC2006 benchmarks shows that the proposed method achieves the optimal accuracy permitted by staleness-based leak detection. Moreover, by incorporating allocation context into the model, the proposed method achieves higher accuracy than is possible with object staleness alone. Evaluation with real-world memory leaks demonstrates that the proposed method is effective for detecting previously reported bugs with high accuracy.
Detecting Performance Anti-patterns for Applications Developed using Object-Relational Mapping
Tse-Hsun Chen, Weiyi Shang, Zhen Ming Jiang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora
Queen's University, Canada; York University, Canada; BlackBerry, Canada
Object-Relational Mapping (ORM) provides developers a conceptual abstraction for mapping the application code to the underlying databases. ORM is widely used in industry due to its convenience; permitting developers to focus on developing the business logic without worrying too much about the database access details. However, developers often write ORM code without considering the impact of such code on database performance, leading to cause transactions with timeouts or hangs in large-scale systems. Unfortunately, there is little support to help developers automatically detect suboptimal database accesses. In this paper, we propose an automated framework to detect ORM performance anti-patterns. Our framework automatically flags performance anti-patterns in the source code. Furthermore, as there could be hundreds or even thousands of instances of anti-patterns, our framework provides sup- port to prioritize performance bug fixes based on a statistically rigorous performance assessment. We have successfully evaluated our framework on two open source and one large-scale industrial systems. Our case studies show that our framework can detect new and known real-world performance bugs and that fixing the detected performance anti- patterns can improve the system response time by up to 98%.
Preprint Available
Dictionary Learning Based Software Defect Prediction
Xiao-Yuan Jing, Shi Ying, Zhi-Wu Zhang, Shan-Shan Wu, and Jin Liu
Wuhan University, China; Nanjing University of Posts and Telecommunications, China
In order to improve the quality of a software system, software defect prediction aims to automatically identify defective software modules for efficient software test. To predict software defect, those classification methods with static code attributes have attracted a great deal of attention. In recent years, machine learning techniques have been applied to defect prediction. Due to the fact that there exists the similarity among different software modules, one software module can be approximately represented by a small proportion of other modules. And the representation coefficients over the pre-defined dictionary, which consists of historical software module data, are generally sparse. In this paper, we propose to use the dictionary learning technique to predict software defect. By using the characteristics of the metrics mined from the open source software, we learn multiple dictionaries (including defective module and defective-free module sub-dictionaries and the total dictionary) and sparse representation coefficients. Moreover, we take the misclassification cost issue into account because the misclassification of defective modules generally incurs much higher risk cost than that of defective-free ones. We thus propose a cost-sensitive discriminative dictionary learning (CDDL) approach for software defect classification and prediction. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that CDDL outperforms several representative state-of-the-art defect prediction methods.
Distilling Privacy Requirements for Mobile Applications
Keerthi Thomas, Arosha K. Bandara, Blaine A. Price, and Bashar Nuseibeh
Open University, UK; University of Limerick, Ireland
As mobile computing applications have become commonplace, it is increasingly important for them to address end-users’ privacy requirements. Privacy requirements depend on a number of contextual socio-cultural factors to which mobility adds another level of contextual variation. However, traditional requirements elicitation methods do not sufficiently account for contextual factors and therefore cannot be used effectively to represent and analyse the privacy requirements of mobile end users. On the other hand, methods that do investigate contextual factors tend to produce data that does not lend itself to the process of requirements extraction. To address this problem we have developed a Privacy Requirements Distillation approach that employs a problem analysis framework to extract and refine privacy requirements for mobile applications from raw data gathered through empirical studies involving end users. Our approach introduces privacy facets that capture patterns of privacy concerns which are matched against the raw data. We demonstrate and evaluate our approach using qualitative data from an empirical study of a mobile social networking application.
Preprint Available
Does Latitude Hurt while Longitude Kills? Geographical and Temporal Separation in a Large Scale Software Development Project
Patrick Wagstrom and Subhajit Datta
IBM Research, USA; Singapore University of Technology and Design, Singapore
Distributed software development allows firms to leverage cost advantages and place work near centers of competency. This distribution comes at a cost -- distributed teams face challenges from differing cultures, skill levels, and a lack of shared working hours. In this paper we examine whether and how geographic and temporal separation in a large scale distributed software development influences developer interactions. We mine the work item trackers for a large commercial software project with a globally distributed development team. We examine both the time to respond and the propensity of individuals to respond and find that when taken together, geographic distance has little effect, while temporal separation has a significant negative impact on the time to respond. However, both have little impact on the social network of individuals in the organization. These results suggest that while temporally distributed teams do communicate, it is at a slower rate, and firms may wish to locate partner teams in similar time zones for maximal performance.
Easing Software Component Repository Evolution
Jérôme Vouillon, Mehdi Dogguy, and Roberto Di Cosmo
University Paris Diderot, France; CNRS, France; EDF, France; Debian, France; INRIA, France
Modern software systems are built by composing components drawn from large repositories, whose size and complexity increase at a fast pace. Maintaining and evolving these software collections is a complex task, and a strict qualification process needs to be enforced. We studied in depth the Debian software repository, one of the largest and most complex existing ones, and we developed comigrate, an extremely efficient tool that is able to identify the largest sets of components that can migrate to the reference repository without violating its quality constraints. This tool outperforms significantly all existing tools, and provides detailed information that is crucial to understand the reasons why some components cannot migrate. Extensive validation on the Debian distribution has been performed. The core architecture of the tool is quite general, and can be easily adapted to other software repositories.
Preprint Available
Additional Information
Effects of Using Examples on Structural Model Comprehension: A Controlled Experiment
Dina Zayan, Michał Antkiewicz, and Krzysztof Czarnecki
University of Waterloo, Canada
We present a controlled experiment for the empirical evaluation of Example-Driven Modeling (EDM), an approach that systematically uses examples for model comprehension and domain knowledge transfer. We conducted the experiment with 26 graduate and undergraduate students from electrical and computer engineering (ECE), computer science (CS), and software engineering (SE) programs at the University of Waterloo. The experiment involves a domain model, with UML class diagrams representing the domain abstractions and UML object diagrams representing examples of using these abstractions. The goal is to provide empirical evidence of the effects of suitable examples in model comprehension, compared to having model abstractions only, by having the participants perform model comprehension tasks. Our results show that EDM is superior to having model abstractions only, with an improvement of 39% for diagram completeness, 30% for questions completeness, 71% for efficiency, and a reduction of 80% for the number of mistakes. We provide qualitative results showing that participants receiving model abstractions augmented with examples experienced lower perceived difficulty in performing the comprehension tasks, higher perceived confidence in their tasks' solutions, and asked fewer clarifying domain questions, a reduction of 90%. We also present participants' feedback regarding the usefulness of the provided examples, their number and types, as well as, the use of partial examples.
Additional Information
Enhancing Symbolic Execution with Veritesting
Thanassis Avgerinos, Alexandre Rebert, Sang Kil Cha, and David Brumley
Carnegie Mellon University, USA
ACM Distinguished Paper
We present MergePoint, a new binary-only symbolic execution system for large-scale and fully unassisted testing of commodity off-the-shelf (COTS) software. MergePoint introduces veritesting, a new technique that employs static symbolic execution to amplify the effect of dynamic symbolic execution. Veritesting allows MergePoint to find twice as many bugs, explore orders of magnitude more paths, and achieve higher code coverage than previous dynamic symbolic execution systems. MergePoint is currently running daily on a 100 node cluster analyzing 33,248 Linux binaries; has generated more than 15 billion SMT queries, 200 million test cases, 2,347,420 crashes, and found 11,687 bugs in 4,379 distinct applications.
Preprint Available
Exploring Variability-Aware Execution for Testing Plugin-Based Web Applications
Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen
Iowa State University, USA; Carnegie Mellon University, USA
In plugin-based systems, plugin conflicts may occur when two or more plugins interfere with one another, changing their expected behaviors. It is highly challenging to detect plugin conflicts due to the exponential explosion of the combinations of plugins (i.e., configurations). In this paper, we address the challenge of executing a test case over many configurations. Leveraging the fact that many executions of a test are similar, our variability-aware execution runs common code once. Only when encountering values that are different depending on specific configurations will the execution split to run for each of them. To evaluate the scalability of variability-aware execution on a large real-world setting, we built a prototype PHP interpreter called Varex and ran it on the popular WordPress blogging Web application. The results show that while plugin interactions exist, there is a significant amount of sharing that allows variability-aware execution to scale to 2^50 configurations within seven minutes of running time. During our study, with Varex, we were able to detect two plugin conflicts: one was recently reported on WordPress forum and another one was not previously discovered.
Feature Maintenance with Emergent Interfaces
Márcio Ribeiro, Paulo Borba, and Christian Kästner
Federal University of Alagoas, Brazil; Federal University of Pernambuco, Brazil; Carnegie Mellon University, USA
Hidden code dependencies are responsible for many complications in maintenance tasks. With the introduction of variable features in configurable systems, dependencies may even cross feature boundaries, causing problems that are prone to be detected late. Many current implementation techniques for product lines lack proper interfaces, which could make such dependencies explicit. As alternative to changing the implementation approach, we provide a tool-based solution to support developers in recognizing and dealing with feature dependencies: emergent interfaces. Emergent interfaces are inferred on demand, based on feature-sensitive intraprocedural and interprocedural data-flow analysis. They emerge in the IDE and emulate modularity benefits not available in the host language. To evaluate the potential of emergent interfaces, we conducted and replicated a controlled experiment, and found, in the studied context, that emergent interfaces can improve performance of code change tasks by up to 3 times while also reducing the number of errors.
Preprint Available
Additional Information
Hope for the Best, Prepare for the Worst: Multi-tier Control for Adaptive Systems
Nicolas D'Ippolito, Víctor Braberman, Jeff Kramer, Jeff Magee, Daniel Sykes, and Sebastian Uchitel
Imperial College London, UK; Universidad de Buenos Aires, Argentina
Most approaches for adaptive systems rely on models, particularly behaviour or architecture models, which describe the system and the environment in which it operates. One of the difficulties in creating such models is uncertainty about the accuracy and completeness of the models. Engineers therefore make assumptions which may prove to be invalid at runtime. In this paper we introduce a rigorous, tiered framework for combining behaviour models, each with different associated assumptions and risks. These models are used to generate operational strategies, through techniques such controller synthesis, which are then executed concurrently at runtime. We show that our framework can be used to adapt the functional behaviour of the system: through graceful degradation when the assumptions of a higher level model are broken, and through progressive enhancement when those assumptions are satisfied or restored.
Preprint Available
How Do API Documentation and Static Typing Affect API Usability?
Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik
University of Duisburg-Essen, Germany; University of Chile, Chile; University of Nevada at Las Vegas, USA
When developers use Application Programming Interfaces (APIs), they often rely on documentation to assist their tasks. In previous studies, we reported evidence indicating that static type systems acted as a form of implicit documentation, benefiting developer productivity. Such implicit documentation is easier to maintain, given it is enforced by the compiler, but previous experiments tested users without any explicit documentation. In this paper, we report on a controlled experiment and an exploratory study comparing the impact of using documentation and a static or dynamic type system on a development task. Results of our study both confirm previous findings and show that the benefits of static typing are strengthened with explicit documentation, but that this was not as strongly felt with dynamically typed languages.
Preprint Available
How Do Centralized and Distributed Version Control Systems Impact Software Changes?
Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig
Oregon State University, USA
Distributed Version Control Systems (DVCS) have seen an increase in popularity relative to traditional Centralized Version Control Systems (CVCS). Yet we know little on whether developers are benefitting from the extra power of DVCS. Without such knowledge, researchers, developers, tool builders, and team managers are in the danger of making wrong assumptions. In this paper we present the first in-depth, large scale empirical study that looks at the influence of DVCS on the practice of splitting, grouping, and committing changes. We recruited 820 participants for a survey that sheds light into the practice of using DVCS. We also analyzed 409M lines of code changed by 358300 commits, made by 5890 developers, in 132 repositories containing a total of 73M LOC. Using this data, we uncovered some interesting facts. For example, (i) commits made in distributed repositories were 32% smaller than the centralized ones, (ii) developers split commits more often in DVCS, and (iii) DVCS commits are more likely to have references to issue tracking labels.
Preprint Available
Additional Information
How Do Professionals Perceive Legacy Systems and Software Modernization?
Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage
Utrecht University, Netherlands
Existing research in legacy system modernization has traditionally focused on technical challenges, and takes the standpoint that legacy systems are obsolete, yet crucial for an organization's operation. Nonetheless, it remains unclear whether practitioners in the industry also share this perception. This paper describes the outcome of an exploratory study in which 26 industrial practitioners were interviewed on what makes a software system a legacy system, what the main drivers are that lead to the modernization of such systems, and what challenges are faced during the modernization process. The findings of the interviews have been validated by means of a survey with 198 respondents. The results show that practitioners value their legacy systems highly, the challenges they face are not just technical, but also include business and organizational aspects.
Preprint Available
How to Make Best Use of Cross-Company Data in Software Effort Estimation?
Leandro L. Minku and Xin Yao
University of Birmingham, UK
Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.
Preprint Available
Improving Automated Source Code Summarization via an Eye-Tracking Study of Programmers
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney D'Mello
University of Notre Dame, USA
ACM Distinguished Paper
Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.
Inductive Verification of Data Model Invariants for Web Applications
Ivan Bocić and Tevfik Bultan
University of California at Santa Barbara, USA
Modern software applications store their data in remote cloud servers. Users interact with these applications using web browsers or thin clients running on mobile devices. A key issue in dependability of these applications is the correctness of the actions that update the data store, which are triggered by user requests. In this paper, we present techniques for au- tomatically checking if the actions of an application preserve the data model invariants. Our approach first automatically extracts a data model specification, which we call an abstract data store, from a given application using instrumented exe- cution. The abstract data store identifies the sets of objects and relations (associations) used by the application, and the actions that update the data store by deleting or creating objects or by changing the relations among the objects. We show that checking invariants of an abstract data store corre- sponds to inductive invariant verification, and can be done using a mapping to First Order Logic (FOL) and using a FOL theorem prover. We implemented this approach for the Rails framework and applied it to three open source applications. We found four previously unknown bugs and reported them to the developers, who confirmed and imme- diately fixed two of them.
Preprint Available
Inferring Models of Concurrent Systems from Logs of Their Behavior with CSight
Ivan Beschastnikh, Yuriy Brun, Michael D. Ernst, and Arvind Krishnamurthy
University of British Columbia, Canada; University of Massachusetts, USA; University of Washington, USA
Concurrent systems are notoriously difficult to debug and understand. A common way of gaining insight into system behavior is to inspect execution logs and documentation. Unfortunately, manual inspection of logs is an arduous process, and documentation is often incomplete and out of sync with the implementation. To provide developers with more insight into concurrent systems, we developed CSight. CSight mines logs of a system's executions to infer a concise and accurate model of that system's behavior, in the form of a communicating finite state machine (CFSM). Engineers can use the inferred CFSM model to understand complex behavior, detect anomalies, debug, and increase confidence in the correctness of their implementations. CSight's only requirement is that the logged events have vector timestamps. We provide a tool that automatically adds vector timestamps to system logs. Our tool prototypes are available at http://synoptic.googlecode.com/. This paper presents algorithms for inferring CFSM models from traces of concurrent systems, proves them correct, provides an implementation, and evaluates the implementation in two ways: by running it on logs from three different networked systems and via a user study that focused on bug finding. Our evaluation finds that CSight infers accurate models that can help developers find bugs.
Preprint Available
Additional Information
Influence of Social and Technical Factors for Evaluating Contribution in GitHub
Jason Tsay, Laura Dabbish, and James Herbsleb
Carnegie Mellon University, USA
Open source software is commonly portrayed as a meritocracy, where decisions are based solely on their technical merit. However, literature on open source suggests a complex social structure underlying the meritocracy. Social work environments such as GitHub make the relationships between users and between users and work artifacts transparent. This transparency enables developers to better use information such as technical value and social connections when making work decisions. We present a study on open source software contribution in GitHub that focuses on the task of evaluating pull requests, which are one of the primary methods for contributing code in GitHub. We analyzed the association of various technical and social measures with the likelihood of contribution acceptance. We found that project managers made use of information signaling both good technical contribution practices for a pull request and the strength of the social connection between the submitter and project manager when evaluating pull requests. Pull requests with many comments were much less likely to be accepted, moderated by the submitter's prior interaction in the project. Well-established projects were more conservative in accepting pull requests. These findings provide evidence that developers use both technical and social information when evaluating potential contributions to open source software projects.
Integrating Adaptive User Interface Capabilities in Enterprise Applications
Pierre A. Akiki, Arosha K. Bandara, and Yijun Yu
Open University, UK
Many existing enterprise applications are at a mature stage in their development and are unable to easily benefit from the usability gains offered by adaptive user interfaces (UIs). Therefore, a method is needed for integrating adaptive UI capabilities into these systems without incurring a high cost or significantly disrupting the way they function. This paper presents a method for integrating adaptive UI behavior in enterprise applications based on CEDAR, a model-driven, service-oriented, and tool-supported architecture for devising adaptive enterprise application UIs. The proposed integration method is evaluated with a case study, which includes establishing and applying technical metrics to measure several of the method’s properties using the open-source enterprise application OFBiz as a test-case. The generality and flexibility of the integration method are also evaluated based on an interview and discussions with practitioners about their real-life projects.
Preprint Available
Additional Information
Interpolated N-Grams for Model Based Testing
Paolo Tonella, Roberto Tiella, and Cu Duy Nguyen
Fondazione Bruno Kessler, Italy; University of Luxembourg, Luxembourg
Models - in particular finite state machine models - provide an invaluable source of information for the derivation of effective test cases. However, models usually approximate part of the program semantics and capture only some of the relevant dependencies and constraints. As a consequence, some of the test cases that are derived from models are infeasible. In this paper, we propose a method, based on the computation of the N-gram statistics, to increase the likelihood of deriving feasible test cases from a model. Correspondingly, the level of model coverage is also expected to increase, because infeasible test cases do not contribute to coverage. While N-grams do improve existing test case derivation methods, they show limitations when the N-gram statistics is incomplete, which is expected to necessarily occur as N increases. Interpolated N-grams overcome such limitation and show the highest performance of all test case derivation methods compared in this work.
Preprint Available
Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation
Wensheng Dou, Shing-Chi Cheung, and Jun Wei
Institute of Software at Chinese Academy of Sciences, China; Hong Kong University of Science and Technology, China
Spreadsheets are widely used by end users for numerical computation in their business. Spreadsheet cells whose computation is subject to the same semantics are often clustered in a row or column. When a spreadsheet evolves, these cell clusters can degenerate due to ad hoc modifications or undisciplined copy-and-pastes. Such degenerated clusters no longer keep cells prescribing the same computational semantics, and are said to exhibit ambiguous computation smells. Our empirical study finds that such smells are common and likely harmful. We propose AmCheck, a novel technique that automatically detects and repairs ambiguous computation smells by recovering their intended computational semantics. A case study using AmCheck suggests that it is useful for discovering and repairing real spreadsheet problems.
Lifting Model Transformations to Product Lines
Rick Salay, Michalis Famelis, Julia Rubin, Alessio Di Sandro, and Marsha Chechik
University of Toronto, Canada
Software product lines and model transformations are two techniques used in industry for managing the development of highly complex software. Product line approaches simplify the handling of software variants while model transformations automate software manipulations such as refactoring, optimization, code generation, etc. While these techniques are well understood independently, combining them to get the benefit of both poses a challenge because most model transformations apply to individual models while model-level product lines represent sets of models. In this paper, we address this challenge by providing an approach for automatically ``lifting'' model transformations so that they can be applied to product lines. We illustrate our approach using a case study and evaluate it through a set of experiments.
Live API Documentation
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes
University of Waterloo, Canada
Application Programming Interfaces (APIs) provide powerful abstraction mechanisms that enable complex functionality to be used by client programs. However, this abstraction does not come for free: understanding how to use an API can be difficult. While API documentation can help, it is often insufficient on its own. Online sites like Stack Overflow and Github Gists have grown to fill the gap between traditional API documentation and more example-based resources. Unfortunately, these two important classes of documentation are independent. In this paper we describe an iterative, deductive method of linking source code examples to API documentation. We also present an implementation of this method, called Baker, that is highly precise (0.97) and supports both Java and JavaScript. Baker can be used to enhance traditional API documentation with up-to-date source code examples; it can also be used to incorporate links to the API documentation into the code snippets that use the API.
Preprint Available
Additional Information
Making Web Applications More Energy Efficient for OLED Smartphones
Ding Li, Angelica Huyen Tran, and William G. J. Halfond
University of Southern California, USA
A smartphone’s display is one of its most energy consuming components. Modern smartphones use OLED displays that consume more energy when displaying light colors as op- posed to dark colors. This is problematic as many popular mobile web applications use large light colored backgrounds. To address this problem we developed an approach for auto- matically rewriting web applications so that they generate more energy efficient web pages. Our approach is based on program analysis of the structure of the web application im- plementation. In the evaluation of our approach we show that it can achieve a 40% reduction in display power con- sumption. A user study indicates that the transformed web pages are acceptable to users with over 60% choosing to use the transformed pages for normal usage.
Manual Refactoring Changes with Automated Refactoring Validation
Xi Ge and Emerson Murphy-Hill
North Carolina State University, USA
Refactoring, the practice of applying behavior-preserving changes to existing code, can enhance the quality of software systems. Refactoring tools can automatically perform and check the correctness of refactorings. However, even when developers have these tools, they still perform about 90% of refactorings manually, which is error-prone. To address this problem, we propose a technique called GhostFactor separating transformation and correctness checking: we allow the developer to transform code manually, but check the correctness of her transformation automatically. We implemented our technique as a Visual Studio plugin, then evaluated it with a human study of eight software developers; GhostFactor improved the correctness of manual refactorings by 67%.
Preprint Available
Micro Execution
Patrice Godefroid
Microsoft Research, USA
Micro execution is the ability to execute any code fragment without a user-provided test driver or input data. The user simply identifies a function or code location in an exe or dll. A runtime Virtual Machine (VM) customized for testing purposes then starts executing the code at that location, catches all memory operations before they occur, allocates memory on-the-fly in order to perform those read/write memory operations, and provides input values according to a customizable memory policy, which defines what read memory accesses should be treated as inputs. MicroX is a first prototype VM allowing micro execution of x86 binary code. No test driver, no input data, no source code, no debug symbols are required: MicroX automatically discovers dynamically the Input/Output interface of the code being run. Input values are provided as needed along the execution and can be generated in various ways, e.g., randomly or using some other test-generation tool. To our knowledge, MicroX is the first VM designed for test isolation and generation purposes. This paper introduces micro execution and discusses how to implement it, strengths and limitations, applications, related work and long-term goals.
Preprint Available
Mind the Gap: Assessing the Conformance of Software Traceability to Relevant Guidelines
Patrick Rempel, Patrick Mäder, Tobias Kuschke, and Jane Cleland-Huang
TU Ilmenau, Germany; DePaul University, USA
Many guidelines for safety-critical industries such as aeronautics, medical devices, and railway communications, specify that traceability must be used to demonstrate that a rigorous process has been followed and to provide evidence that the system is safe for use. In practice, there is a gap between what is prescribed by guidelines and what is implemented in practice, making it difficult for organizations and certifiers to fully evaluate the safety of the software system. In this paper we present an approach, which parses a guideline to extract a Traceability Model depicting software artifact types and their prescribed traces. It then analyzes the traceability data within a project to identify areas of traceability failure. Missing traceability paths, redundant and/or inconsistent data, and other problems are highlighted. We used our approach to evaluate the traceability of seven safety-critical software systems and found that none of the evaluated projects contained traceability that fully conformed to its relevant guidelines.
Preprint Available
Mining Behavior Models from User-Intensive Web Applications
Carlo Ghezzi, Mauro Pezzè, Michele Sama, and Giordano Tamburrelli
Politecnico di Milano, Italy; University of Lugano, Switzerland; Touchtype, UK
Many modern user-intensive applications, such as Web applications, must satisfy the interaction requirements of thousands if not millions of users, which can be hardly fully understood at design time. Designing applications that meet user behaviors, by efficiently supporting the prevalent navigation patterns, and evolving with them requires new approaches that go beyond classic software engineering solutions. We present a novel approach that automates the acquisition of user-interaction requirements in an incremental and reflective way. Our solution builds upon inferring a set of probabilistic Markov models of the users' navigational behaviors, dynamically extracted from the interaction history given in the form of a log file. We annotate and analyze the inferred models to verify quantitative properties by means of probabilistic model checking. The paper investigates the advantages of the approach referring to a Web application currently in use.
Preprint Available
Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features
Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N. Nguyen
Iowa State University, USA
Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren't; developers convert existing code to use new features; and we found thousands of instances of potential resource handling bugs.
Preprint Available
Additional Information
Mining Configuration Constraints: Static Analyses and Empirical Results
Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czarnecki
University of Waterloo, Canada; IT University of Copenhagen, Denmark; Carnegie Mellon University, USA
Highly-configurable systems allow users to tailor the software to their specific needs. Not all combinations of configuration options are valid though, and constraints arise for technical or non-technical reasons. Explicitly describing these constraints in a variability model allows reasoning about the supported configurations. To automate creating variability models, we need to identify the origin of such configuration constraints. We propose an approach which uses build-time errors and a novel feature-effect heuristic to automatically extract configuration constraints from C code. We conduct an empirical study on four highly-configurable open-source systems with existing variability models having three objectives in mind: evaluate the accuracy of our approach, determine the recoverability of existing variability-model constraints using our analysis, and classify the sources of variability-model constraints. We find that both our extraction heuristics are highly accurate (93% and 77% respectively), and that we can recover 19% of the existing variability-models using our approach. However, we find that many of the remaining constraints require expert knowledge or more expensive analyses. We argue that our approach, tooling, and experimental results support researchers and practitioners working on variability model re-engineering, evolution, and consistency-checking techniques.
Preprint Available
Additional Information
Mining Fine-Grained Code Changes to Detect Unknown Change Patterns
Stas Negara, Mihai Codoban, Danny Dig, and Ralph E. Johnson
University of Illinois at Urbana-Champaign, USA; Oregon State University, USA
Identifying repetitive code changes benefits developers, tool builders, and researchers. Tool builders can automate the popular code changes, thus improving the productivity of developers. Researchers can better understand the practice of code evolution, advancing existing code assistance tools and benefiting developers even further. Unfortunately, existing research either predominantly uses coarse-grained Version Control System (VCS) snapshots as the primary source of code evolution data or considers only a small subset of program transformations of a single kind - refactorings. We present the first approach that identifies previously unknown frequent code change patterns from a fine-grained sequence of code changes. Our novel algorithm effectively handles challenges that distinguish continuous code change pattern mining from the existing data mining techniques. We evaluated our algorithm on 1,520 hours of code development collected from 23 developers, and showed that it is effective, useful, and scales to large amounts of data. We analyzed some of the mined code change patterns and discovered ten popular kinds of high-level program transformations. More than half of our 420 survey participants acknowledged that eight out of ten transformations are relevant to their programming activities.
Preprint Available
Mining Interprocedural, Data-Oriented Usage Patterns in JavaScript Web Applications
Hung Viet Nguyen, Hoan Anh Nguyen, Anh Tuan Nguyen, and Tien N. Nguyen
Iowa State University, USA
A frequently occurring usage of program elements in a programming language and software libraries is called a usage pattern. In JavaScript (JS) Web applications, JS usage patterns in their source code have special characteristics that pose challenges in pattern mining. They involve nested data objects with no corresponding names or types. JS functions can be also used as data objects. JS usages are often cross-language, inter-procedural, and involve control and data flow dependencies among JS program entities and data objects whose data types are revealed only at run time due to dynamic typing in JS. This paper presents JSModel, a novel graph-based representation for JS usages, and JSMiner, a scalable approach to mine inter-procedural, data-oriented JS usage patterns. Our empirical evaluation on several Web programs shows that JSMiner efficiently detects more JS patterns with higher accuracy than a state-of-the-art approach. We conducted experiments to show JSModel's usefulness in two applications: detecting anti-patterns (buggy patterns) and documenting JS APIs via pattern skeletons. Our controlled experiment shows that the mined patterns are useful as JS documentation and code templates.
MintHint: Automated Synthesis of Repair Hints
Shalini Kaleeswaran, Varun Tulsian, Aditya Kanade, and Alessandro Orso
Indian Institute of Science, India; Georgia Tech, USA
Being able to automatically repair programs is at the same time a very compelling vision and an extremely challenging task. In this paper, we present MintHint, a novel technique for program repair that is a departure from most of today’s approaches. Instead of trying to fully automate program repair, which is often an unachievable goal, MintHint performs statistical correlation analysis to identify expressions that are likely to occur in the repaired code and generates, using pattern-matching based synthesis, repair hints from these expressions. Intuitively, these hints suggest how to rectify a faulty statement and help developers find a complete, actual repair. We also present an empirical evaluation of MintHint in two parts. The first part is a user study that shows that, when debugging, developers’ productivity improved manyfold with the use of repair hints—instead of traditional fault localization information alone. The second part consists of applying MintHint to several faults in Unix utilities to further assess the effectiveness of the approach. Our results show that MintHint performs well even in common situations where (1) the repair space searched does not contain the exact repair, and (2) the operational specification obtained from the test cases for repair is incomplete or even imprecise, which can be challenging for approaches aiming at fully automated repair.
Preprint Available
Patch Verification via Multiversion Interprocedural Control Flow Graphs
Wei Le and Shannon D. Pattison
Rochester Institute of Technology, USA
Software development is inherently incremental; however, it is challenging to correctly introduce changes on top of existing code. Recent studies show that 15%-24% of the bug fixes are incorrect, and the most important yet hard-to-acquire information for programming changes is whether this change breaks any code elsewhere. This paper presents a framework, called Hydrogen, for patch verification. Hydrogen aims to automatically determine whether a patch correctly fixes a bug, a new bug is introduced in the change, a bug can impact multiple software releases, and the patch is applicable for all the impacted releases. Hydrogen consists of a novel program representation, namely multiversion interprocedural control flow graph (MVICFG), that integrates and compares control flow of multiple versions of programs, and a demand-driven, path-sensitive symbolic analysis that traverses the MVICFG for detecting bugs related to software changes and versions. In this paper, we present the definition, construction and applications of MVICFGs. Our experimental results show that Hydrogen correctly builds desired MVICFGs and is scalable to real-life programs such as libpng, tightvnc and putty. We experimentally demonstrate that MVICFGs can enable efficient patch verification. Using the results generated by Hydrogen, we have found a few documentation errors related to patches for a set of open-source programs.
Preprint Available
Performance Regression Testing Target Prioritization via Performance Risk Analysis
Peng Huang, Xiao Ma, Dongcai Shen, and Yuanyuan Zhou
University of California at San Diego, USA; University of Illinois at Urbana-Champaign, USA
As software evolves, problematic changes can significantly degrade software performance, i.e., introducing performance regression. Performance regression testing is an effective way to reveal such issues in early stages. Yet because of its high overhead, this activity is usually performed infrequently. Consequently, when performance regression issue is spotted at a certain point, multiple commits might have been merged since last testing. Developers have to spend extra time and efforts narrowing down which commit caused the problem. Existing efforts try to improve performance regression testing efficiency through test case reduction or prioritization. In this paper, we propose a new lightweight and white-box approach, performance risk analysis (PRA), to improve performance regression testing efficiency via testing target prioritization. The analysis statically evaluates a given source code commit's risk in introducing performance regression. Performance regression testing can leverage the analysis result to test commits with high risks first while delaying or skipping testing on low-risk commits. To validate this idea's feasibility, we conduct a study on 100 real-world performance regression issues from three widely used, open-source software. Guided by insights from the study, we design PRA and build a tool, PerfScope. Evaluation on the examined problematic commits shows our tool can successfully alarm 91% of them. Moreover, on 600 randomly picked new commits from six large-scale software, with our tool, developers just need to test only 14-22% of the 600 commits and will still be able to alert 87-95% of the commits with performance regression.
Preprint Available
Additional Information
Perturbation Analysis of Stochastic Systems with Empirical Distribution Parameters
Guoxin Su and David S. Rosenblum
National University of Singapore, Singapore
Probabilistic model checking is a quantitative verification technology for computer systems and has been the focus of intense research for over a decade. While in many circumstances of probabilistic model checking it is reasonable to anticipate a possible discrepancy between a stochastic model and a real-world system it represents, the state-of-the-art provides little account for the effects of this discrepancy on verification results. To address this problem, we present a perturbation approach in which quantities such as transition probabilities in the stochastic model are allowed to be perturbed from their measured values. We present a rigorous mathematical characterization for variations that can occur to verification results in the presence of model perturbations. The formal treatment is based on the analysis of a parametric variant of discrete-time Markov chains, called parametric Markov chains (PMCs), which are equipped with a metric to measure their perturbed vector variables. We employ an asymptotic method from perturbation theory to compute two forms of perturbation bounds, namely condition numbers and quadratic bounds, for automata-based verification of PMCs. We also evaluate our approach with case studies on variant models for three widely studied systems, the Zeroconf protocol, the Leader Election Protocol and the NAND Multiplexer.
Programmers' Build Errors: A Case Study (at Google)
Hyunmin Seo, Caitlin Sadowski, Sebastian Elbaum, Edward Aftandilian, and Robert Bowdidge
Hong Kong University of Science and Technology, China; Google, USA; University of Nebraska-Lincoln, USA
Building is an integral part of the software development process. However, little is known about the compiler errors that occur in this process. In this paper, we present an empirical study of 26.6 million builds produced during a period of nine months by thousands of developers. We describe the workflow through which those builds are generated, and we analyze failure frequency, compiler error types, and resolution efforts to fix those compiler errors. The results provide insights on how a large organization build process works, and pinpoints errors for which further developer support would be most effective.
Preprint Available
Property Differencing for Incremental Checking
Guowei Yang, Sarfraz Khurshid, Suzette Person, and Neha Rungta
Texas State University, USA; University of Texas at Austin, USA; NASA Langley Research Center, USA; NASA Ames Research Center, USA
This paper introduces iProperty, a novel approach that facilitates incremental checking of programs based on a property differencing technique. Specifically, iProperty aims to reduce the cost of checking properties as they are initially developed and as they co-evolve with the program. The key novelty of iProperty is to compute the differences between the new and old versions of expected properties to reduce the number and size of the properties that need to be checked during the initial development of the properties. Furthermore, property differencing is used in synergy with program behavior differencing techniques to optimize common regression scenarios, such as detecting regression errors or checking feature additions for conformance to new expected properties. Experimental results in the context of symbolic execution of Java programs annotated with properties written as assertions show the effectiveness of iProperty in utilizing change information to enable more efficient checking.
Preprint Available
Requirements Fixation
Rahul Mohanani, Paul Ralph, and Ben Shreeve
Lancaster University, UK
There is a broad consensus that understanding system desiderata (requirements) and design creativity are both important for software engineering success. However, little research has addressed the relationship between design creativity and the way requirements are framed or presented. This paper therefore aims to investigate the possibility that the way desiderata are framed or presented can affect design creativity. Forty two participants took part in a randomized control trial where one group received desiderata framed as “requirements” while the other received desiderata framed as “ideas”. Participants produced design concepts which were judged for originality. Participants who received requirements framing produced significantly less original designs than participants who received ideas framing (Mann-Whitney U=116.5, p=0.004). We conclude that framing desiderata as “requirements” may cause requirements fixation where designers’ preoccupation with satisfying explicit requirements inhibits their creativity.
Preprint Available
Reuse-Oriented Reverse Engineering of Functional Components from X86 Binaries
Dohyeong Kim, William N. Sumner, Xiangyu Zhang, Dongyan Xu, and Hira Agrawal
Purdue University, USA; Simon Fraser University, Canada; Applied Communications Sciences, USA
Locating, extracting, and reusing the implementation of a feature within an existing binary program is challenging. This paper proposes a novel algorithm to identify modular functions corresponding to such features and to provide usable interfaces for the extracted functions. We provide a way to represent a desired feature with two executions that both execute the feature but with different inputs. Instead of reverse engineering the interface of a function, we wrap the existing interface and provide a simpler and more intuitive interface for the function through concretization and redirection. Experiments show that our technique can be applied to extract varied features from several real world applications including a malicious application.
Reviser: Efficiently Updating IDE-/IFDS-Based Data-Flow Analyses in Response to Incremental Program Changes
Steven Arzt and Eric Bodden
TU Darmstadt, Germany; Fraunhofer SIT, Germany
Most application code evolves incrementally, and especially so when being maintained after the applications have been deployed. Yet, most data-flow analyses do not take advantage of this fact. Instead they require clients to recompute the entire analysis even if little code has changed—a time consuming undertaking, especially with large libraries or when running static analyses often, e.g., on a continuous-integration server. In this work, we present Reviser, a novel approach for automatically and efficiently updating inter-procedural dataflow analysis results in response to incremental program changes. Reviser follows a clear-and-propagate philosophy, aiming at clearing and recomputing analysis information only where required, thereby greatly reducing the required computational effort. The Reviser algorithm is formulated as an extension to the IDE framework for Inter-procedural Finite Distributed Environment problems and automatically updates arbitrary IDE-based analyses. We have implemented Reviser as an open-source extension to the Heros IFDS/IDE solver and the Soot program-analysis framework. An evaluation of Reviser on various client analyses and target programs shows performance gains of up to 80% in comparison to a full recomputation. The experiments also show Reviser to compute the same results as a full recomputation on all instances tested.
Preprint Available
Additional Information
SEEDS: A Software Engineer's Energy-Optimization Decision Support Framework
Irene Manotas, Lori Pollock, and James Clause
University of Delaware, USA
Reducing the energy usage of software is becoming more important in many environments, in particular, battery-powered mobile devices, embedded systems and data centers. Recent empirical studies indicate that software engineers can support the goal of reducing energy usage by making design and implementation decisions in ways that take into consideration how such decisions impact the energy usage of an application. However, the large number of possible choices and the lack of feedback and information available to software engineers necessitates some form of automated decision-making support. This paper describes the first known automated support for systematically optimizing the energy usage of applications by making code-level changes. It is effective at reducing energy usage while freeing developers from needing to deal with the low-level, tedious tasks of applying changes and monitoring the resulting impacts to the energy usage of their application. We present a general framework, SEEDS, as well as an instantiation of the framework that automatically optimizes Java applications by selecting the most energy-efficient library implementations for Java's Collections API. Our empirical evaluation of the framework and instantiation show that it is possible to improve the energy usage of an application in a fully automated manner for a reasonable cost.
Preprint Available
Self-Adaptation through Incremental Generative Model Transformations at Runtime
Bihuan Chen, Xin Peng, Yijun Yu, Bashar Nuseibeh, and Wenyun Zhao
Fudan University, China; Open University, UK; University of Limerick, Ireland
A self-adaptive system uses runtime models to adapt its architecture to the changing requirements and contexts. However, there is no one-to-one mapping between the requirements in the problem space and the architectural elements in the solution space. Instead, one refined requirement may crosscut multiple architectural elements, and its realization involves complex behavioral or structural interactions manifested as architectural design decisions. In this paper we propose to combine two kinds of self-adaptations: requirements-driven self-adaptation, which captures requirements as goal models to reason about the best plan within the problem space, and architecture-based self-adaptation, which captures architectural design decisions as decision trees to search for the best design for the desired requirements within the contextualized solution space. Following these adaptations, component-based architecture models are reconfigured using incremental and generative model transformations. Compared with requirements-driven or architecture-based approaches, the case study using an online shopping benchmark shows promise that our approach can further improve the effectiveness of adaptation (e.g. system throughput in this case study) and offer more adaptation flexibility.
Preprint Available
SimRT: An Automated Framework to Support Regression Testing for Data Races
Tingting Yu, Witawas Srisa-an, and Gregg Rothermel
University of Nebraska-Lincoln, USA
Concurrent programs are prone to various classes of difficult-to-detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SimRT, an automated regression testing framework for use in detecting races introduced by code modifications. SimRT employs a regression test selection technique, focused on sets of program elements related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritization technique to improve the rate at which such races are detected. Our empirical study of SimRT reveals that it is more efficient and effective for revealing races than other approaches, and that its constituent test selection and prioritization components each contribute to its performance.
Preprint Available
Software Engineering at the Speed of Light: How Developers Stay Current using Twitter
Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey
University of Victoria, Canada; Federal University of Rio Grande do Norte, Brazil
The microblogging service Twitter has over 500 million users posting over 500 million tweets daily. Research has established that software developers use Twitter in their work, but this has not yet been examined in detail. Twitter is an important medium in some software engineering circles—understanding its use could lead to improved support, and learning more about the reasons for non-adoption could inform the design of improved tools. In a qualitative study, we surveyed 271 and interviewed 27 developers active on GitHub. We find that Twitter helps them keep up with the fast-paced development landscape. They use it to stay aware of industry changes, for learning, and for building relationships. We discover the challenges they experience and extract their coping strategies. Some developers do not want to or cannot embrace Twitter for their work—we show their reasons and alternative channels. We validate our findings in a follow-up survey with more than 1,200 respondents.
Preprint Available
Additional Information
Spotting Working Code Examples
Iman Keivanloo, Juergen Rilling, and Ying Zou
Queen's University, Canada; Concordia University, Canada
Working code examples are useful resources for pragmatic reuse in software development. A working code example provides a solution to a specific programming problem. Earlier studies have shown that existing code search engines are not successful in finding working code examples. They fail in ranking high quality code examples at the top of the result set. To address this shortcoming, a variety of pattern-based solutions are proposed in the literature. However, these solutions cannot be integrated seamlessly in Internet-scale source code engines due to their high time complexity or query language restrictions. In this paper, we propose an approach for spotting working code examples which can be adopted by Internet-scale source code search engines. The time complexity of our approach is as low as the complexity of existing code search engines on the Internet and considerably lower than the pattern-based approaches supporting free-form queries. We study the performance of our approach using a representative corpus of 25,000 open source Java projects. Our findings support the feasibility of our approach for Internet-scale code search. We also found that our approach outperforms Ohloh Code search engine, previously known as Koders, in spotting working code examples.
Preprint Available
Symbolic Assume-Guarantee Reasoning through BDD Learning
Fei He, Bow-Yaw Wang, Liangze Yin, and Lei Zhu
Tsinghua University, China; Academia Sinica, Taiwan
Both symbolic model checking and assume-guarantee reasoning aim to circumvent the state explosion problem. Symbolic model checking explores many states simultaneously and reports numerous erroneous traces. Automated assume-guarantee reasoning, on the other hand, infers contextual assumptions by inspecting spurious erroneous traces. One would expect that their integration could further improve the capacity of model checking. Yet examining numerous erroneous traces to deduce contextual assumptions can be very time-consuming. The integration of symbolic model checking and assume-guarantee reasoning is thus far from clear. In this paper, we present a progressive witness analysis algorithm for automated assume-guarantee reasoning to exploit a multitude of traces from BDD-based symbolic model checkers. Our technique successfully integrates symbolic model checking with automated assume-guarantee reasoning by directly inferring BDD's as implicit assumptions. It outperforms monolithic symbolic model checking in four benchmark problems and an industrial case study in experiments.
Preprint Available
The Dimensions of Software Engineering Success
Paul Ralph and Paul Kelly
Lancaster University, UK
Software engineering research and practice are hampered by the lack of a well-understood, top-level dependent variable. Recent initiatives on General Theory of Software Engineering suggest a multifaceted variable – Software Engineering Success. However, its exact dimensions are unknown. This paper investigates the dimensions (not causes) of software engineering success. An interdisciplinary sample of 191 design professionals (68 in the software industry) were interviewed concerning their perceptions of success. Non-software designers (e.g. architects) were included to increase the breadth of ideas and facilitate comparative analysis. Transcripts were subjected to supervised, semi-automated semantic content analysis, including a software developer vs. other professionals comparison. Findings suggest that participants view their work as time-constrained projects with explicit clients and other stakeholders. Success depends on stakeholder impacts – financial, social, physical and emotional – and is understood through feedback. Concern with meeting explicit requirements is peculiar to software engineering and design is not equated with aesthetics in many other fields. Software engineering success is a complex multifaceted variable, which cannot sufficiently be explained by traditional dimensions including user satisfaction, profitability or meeting requirements, budgets and schedules. A proto-theory of success is proposed, which models success as the net impact on a particular stakeholder at a particular time. Stakeholder impacts are driven by project efficiency, artifact quality and market performance. Success is not additive, e.g., ‘low’ success for clients does not average with ‘high’ success for developers to make ‘moderate’ success overall; rather, a project may be simultaneously successful and unsuccessful from different perspectives.
Preprint Available
The Strength of Random Search on Automated Program Repair
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang
National University of Defense Technology, China
Automated program repair recently received considerable attentions, and many techniques on this research area have been proposed. Among them, two genetic-programming-based techniques, GenProg and Par, have shown the promising results. In particular, GenProg has been used as the baseline technique to check the repair effectiveness of new techniques in much literature. Although GenProg and Par have shown their strong ability of fixing real-life bugs in nontrivial programs, to what extent GenProg and Par can benefit from genetic programming, used by them to guide the patch search process, is still unknown. To address the question, we present a new automated repair technique using random search, which is commonly considered much simpler than genetic programming, and implement a prototype tool called RSRepair. Experiment on 7 programs with 24 versions shipping with real-life bugs suggests that RSRepair, in most cases (23/24), outperforms GenProg in terms of both repair effectiveness (requiring fewer patch trials) and efficiency (requiring fewer test case executions), justifying the stronger strength of random search over genetic programming. According to experimental results, we suggest that every proposed technique using optimization algorithm should check its effectiveness by comparing it with random search.
Preprint Available
Time Pressure: A Controlled Experiment of Test Case Development and Requirements Review
Mika V. Mäntylä, Kai Petersen, Timo O. A. Lehtinen, and Casper Lassenius
Aalto University, Finland; Blekinge Institute of Technology, Sweden
Time pressure is prevalent in the software industry in which shorter and shorter deadlines and high customer demands lead to increasingly tight deadlines. However, the effects of time pressure have received little attention in software engineering research. We performed a controlled experiment on time pressure with 97 observations from 54 subjects. Using a two-by-two crossover design, our subjects performed requirements review and test case development tasks. We found statistically significant evidence that time pressure increases efficiency in test case development (high effect size Cohen’s d=1.279) and in requirements review (medium effect size Cohen’s d=0.650). However, we found no statistically significant evidence that time pressure would decrease effectiveness or cause adverse effects on motivation, frustration or perceived performance. We also investigated the role of knowledge but found no evidence of the mediating role of knowledge in time pressure as suggested by prior work, possibly due to our subjects. We conclude that applying moderate time pressure for limited periods could be used to increase efficiency in software engineering tasks that are well structured and straight forward.
Preprint Available
Additional Information
Towards Efficient Optimization in Package Management Systems
Alexey Ignatiev, Mikoláš Janota, and Joao Marques-Silva
INESC-ID, Portugal; University College Dublin, Ireland
Package management as a means of reuse of software artifacts has become extremely popular, most notably in Linux distributions. At the same time, successful package management brings about a number of computational challenges. Whenever a user requires a new package to be installed, a package manager not only installs the new package but it might also install other packages or uninstall some old ones in order to respect dependencies and conflicts of the packages. Coming up with a new configuration of packages is computationally challenging. It is in particular complex when we also wish to optimize for user preferences, such as that the resulting package configuration should not differ too much from the original one. A number of exact approaches for solving this problem have been proposed in recent years. These approaches, however, do not have guaranteed runtime due to the high computational complexity of the problem. This paper addresses this issue by devising a hybrid approach that integrates exact solving with approximate solving by invoking the approximate part whenever the solver is running out of time. Experimental evaluation shows that this approach enables returning high-quality package configurations with rapid response time.
Preprint Available
TradeMaker: Automated Dynamic Analysis of Synthesized Tradespaces
Hamid Bagheri, Chong Tang, and Kevin Sullivan
George Mason University, USA; University of Virginia, USA
System designers today are focusing less on point solutions for complex systems and more on design spaces, often with a focus on understanding tradeoffs among non-functional properties across such spaces. This shift places a premium on the efficient comparative evaluation of non-functional properties of designs in such spaces. While static analysis of designs will sometimes suffice, often one must run designs dynamically, under comparable loads, to determine properties and tradeoffs. Yet variant designs often present variant interfaces, requiring that common loads be specialized to many interfaces. The main contributions of this paper are a mathematical framework, architecture, and tool for specification-driven synthesis of design spaces and common loads specialized to individual designs for dynamic tradeoff analysis of non-functional properties in large design spaces. To test our approach we used it to run an experiment to test the validity of static metrics for object-relational database mappings, requiring design space and load synthesis for, and dynamic analysis of, hundreds of database designs.
Preprint Available
Trading Robustness for Maintainability: An Empirical Study of Evolving C# Programs
Nélio Cacho, Thiago César, Thomas Filipe, Eliezio Soares, Arthur Cassio, Rafael Souza, Israel Garcia, Eiji Adachi Barbosa, and Alessandro Garcia
Federal University of Rio Grande do Norte, Brazil; PUC-Rio, Brazil
ACM Distinguished Paper
Mainstream programming languages provide built-in exception handling mechanisms to support robust and maintainable implementation of exception handling in software systems. Most of these modern languages, such as C#, Ruby, Python and many others, are often claimed to have more appropriated exception handling mechanisms. They reduce programming constraints on exception handling to favor agile changes in the source code. These languages provide what we call maintenance-driven exception handling mechanisms. It is expected that the adoption of these mechanisms improve software maintainability without hindering software robustness. However, there is still little empirical knowledge about the impact that adopting these mechanisms have on software robustness. This paper addressed this gap by conducting an empirical study aimed at understanding the relationship between changes in C# programs and their robustness. In particular, we evaluated how changes in the normal and exceptional code were related to exception handling faults. We applied a change impact analysis and a control flow analysis in 119 versions of 16 C# programs. The results showed that: (i) most of the problems hindering software robustness in those programs are caused by changes in the normal code, (ii) many potential faults were introduced even when improving exception handling in C# code, and (iii) faults are often facilitated by the maintenance-driven flexibility of the exception handling mechanism. Moreover, we present a series of change scenarios that decrease the program robustness.
Preprint Available
Transition from Centralized to Decentralized Version Control Systems: A Case Study on Reasons, Barriers, and Outcomes
Kıvanç Muşlu, Christian Bird, Nachiappan Nagappan, and Jacek Czerwonka
University of Washington, USA; Microsoft Research, USA; Microsoft, USA
In recent years, software development has started to transition from centralized version control systems (CVCSs) to decentralized version control systems (DVCSs). Although CVCSs and DVCSs have been studied extensively, there has been little research on the transition across these systems. This paper investigates the transition process, from the developer’s view, in a large company. The paper captures the transition reasons, barriers, and outcomes through 10 developer interviews, and investigates these findings through a survey, participated by 70 developers. The paper identifies that the majority of the developers need to work incrementally and offline, and manage multiple contexts efficiently. DVCSs fulfill these developer needs; however the transition comes with a cost depending on the previous development workflow. The paper discusses the transition reasons, barriers and outcomes, and provides recommendations for teams planning such a transition. The paper shows that lightweight branches, and local and incremental commits were the main reasons for developers wanting to move to a DVCS. Further, the paper identifies the main problems with the transition process as: steep DVCS learning curve; incomplete DVCS integration with the rest of the development workflow; and DVCS scaling issues.
Preprint Available
Two's Company, Three's a Crowd: A Case Study of Crowdsourcing Software Development
Klaas-Jan Stol and Brian Fitzgerald
Lero, Ireland; University of Limerick, Ireland
Crowdsourcing is an emerging and promising approach which involves delegating a variety of tasks to an unknown workforce - the crowd. Crowdsourcing has been applied quite successfully in various contexts from basic tasks on Amazon Mechanical Turk to solving complex industry problems, e.g. InnoCentive. Companies are increasingly using crowdsourcing to accomplish specific software development tasks. However, very little research exists on this specific topic. This paper presents an in-depth industry case study of crowdsourcing software development at a multinational corporation. Our case study highlights a number of challenges that arise when crowdsourcing software development. For example, the crowdsourcing development process is essentially a waterfall model and this must eventually be integrated with the agile approach used by the company. Crowdsourcing works better for specific software development tasks that are less complex and stand-alone without interdependencies. The development cost was much greater than originally expected, overhead in terms of company effort to prepare specifications and answer crowdsourcing community queries was much greater, and the time-scale to complete contests, review submissions and resolve quality issues was significant. Finally, quality issues were pushed later in the lifecycle given the lengthy process necessary to identify and resolve quality issues. Given the emphasis in software engineering on identifying bugs as early as possible, this is quite problematic.
Preprint Available
Additional Information
Uncertainty, Risk, and Information Value in Software Requirements and Architecture
Emmanuel Letier, David Stefan, and Earl T. Barr
University College London, UK
Uncertainty complicates early requirements and architecture decisions and may expose a software project to significant risk. Yet software architects lack support for evaluating uncertainty, its impact on risk, and the value of reducing uncertainty before making critical decisions. We propose to apply decision analysis and multi-objective optimisation techniques to provide such support. We present a systematic method allowing software architects to describe uncertainty about the impact of alternatives on stakeholders' goals; to calculate the consequences of uncertainty through Monte-Carlo simulation; to shortlist candidate architectures based on expected costs, benefits and risks; and to assess the value of obtaining additional information before deciding. We demonstrate our method on the design of a system for coordinating emergency response teams. Our approach highlights the need for requirements engineering and software cost estimation methods to disclose uncertainty instead of hiding it.
Preprint Available
Understanding JavaScript Event-Based Interactions
Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabiraman
University of British Columbia, Canada
ACM Distinguished Paper
Web applications have become one of the fastest growing types of software systems today. Despite their popularity, understanding the behaviour of modern web applications is still a challenging endeavour for developers during development and maintenance tasks. The challenges mainly stem from the dynamic, event-driven, and asynchronous nature of the JavaScript language. We propose a generic technique for capturing low-level event-based interactions in a web application and mapping those to a higher-level behavioural model. This model is then transformed into an interactive visualization, representing episodes of triggered causal and temporal events, related JavaScript code executions, and their impact on the dynamic DOM state. Our approach, implemented in a tool called Clematis, allows developers to easily understand the complex dynamic behaviour of their application at three different semantic levels of granularity. The results of our industrial controlled experiment show that Clematis is capable of improving the task accuracy by 61%, while reducing the task completion time by 47%.
Preprint Available
Understanding Understanding Source Code with Functional Magnetic Resonance Imaging
Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann
University of Passau, Germany; Carnegie Mellon University, USA; Georgia Tech, USA; Leibniz Institute for Neurobiology, Germany; Metop Research Institute, Germany; University of Magdeburg, Germany
Program comprehension is an important cognitive process that inherently eludes direct measurement. Thus, researchers are struggling with providing suitable programming languages, tools, or coding conventions to support developers in their everyday work. In this paper, we explore whether functional magnetic resonance imaging (fMRI), which is well established in cognitive neuroscience, is feasible to soundly measure program comprehension. In a controlled experiment, we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax errors. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing---all processes that fit well to our understanding of program comprehension. Our results encourage us and, hopefully, other researchers to use fMRI in future studies to measure program comprehension and, in the long run, answer questions, such as: Can we predict whether someone will be an excellent programmer? How effective are new languages and tools for program understanding? How should we train programmers?
Preprint Available
Understanding and Improving Software Build Teams
Shaun Phillips, Thomas Zimmermann, and Christian Bird
University of Calgary, Canada; Microsoft Research, USA
Build, creating software from source code, is a fundamental activity in software development. Build teams manage this process and ensure builds are produced reliably and efficiently. This paper presents an exploration into the nature of build teams--how they form, work, and relate to other teams--through three multi-method studies conducted at Microsoft. We also consider build team effectiveness and find that many challenges are social, not technical: role ambiguity, knowledge sharing, communication, trust, and conflict. Our findings validate theories from group dynamics and organization science, and using a cross-discipline approach, we apply learnings from these fields to inform the design of engineering tools and practices to improve build team effectiveness
Preprint Available
Unit Test Virtualization with VMVM
Jonathan Bell and Gail Kaiser
Columbia University, USA
ACM Distinguished Paper
Testing large software packages can become very time intensive. To address this problem, researchers have investigated techniques such as Test Suite Minimization. Test Suite Minimization reduces the number of tests in a suite by removing tests that appear redundant, at the risk of a reduction in fault-finding ability since it can be difficult to identify which tests are truly redundant. We take a completely different approach to solving the same problem of long running test suites by instead reducing the time needed to execute each test, an approach that we call Unit Test Virtualization. With Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight virtualization container. We describe the empirical analysis that grounds our approach and provide an implementation of Unit Test Virtualization targeting Java applications. We evaluated our implementation, VMVM, using 20 real-world Java applications and found that it reduces test suite execution time by up to 97% (on average, 62%) when compared to traditional unit test execution. We also compared VMVM to a well known Test Suite Minimization technique, finding the reduction provided by VMVM to be four times greater, while still executing every test with no loss of fault-finding ability.
Preprint Available
Additional Information
Unleashing Concurrency for Irregular Data Structures
Peng Liu and Charles Zhang
Wuhan University, China; Hong Kong University of Science and Technology, China
To implement the atomicity in accessing the irregular data structure, developers often use the coarse-grained locking because the hierarchical nature of the data structure makes the reasoning of fine-grained locking difficult and error-prone for the update of an ancestor field in the data structure may affect its descendants. The coarse-grained locking disallows the concurrent accesses to the entire data structure and leads to a low degree of concurrency. We propose an approach, built upon the Multiple Granularity Lock (MGL), that replaces the coarse-grained locks to unleash more concurrency for irregular data structures. Our approach is widely applicable and does not require the data structures to have special shapes. We produce the MGL locks through reasoning about the hierarchy of the data structure and the accesses to it. According to the evaluation results on widely used applications, our optimization brings the significant speedup, e.g., at least 7%-20% speedup and up to 2X speedup.
Us and Them: A Study of Privacy Requirements Across North America, Asia, and Europe
Swapneel Sheth, Gail Kaiser, and Walid Maalej
Columbia University, USA; University of Hamburg, Germany
Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. However, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives.
Preprint Available
Using Dynamic Analysis to Generate Disjunctive Invariants
ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest
University of New Mexico, USA; University of Virginia, USA
Program invariants are important for defect detection, program verification, and program repair. However, existing techniques have limited support for important classes of invariants such as disjunctions, which express the semantics of conditional statements. We propose a method for generating disjunctive invariants over numerical domains, which are inexpressible using classical convex polyhedra. Using dynamic analysis and reformulating the problem in non-standard ``max-plus'' and ``min-plus'' algebras, our method constructs hulls over program trace points. Critically, we introduce and infer a weak class of such invariants that balances expressive power against the computational cost of generating nonconvex shapes in high dimensions. Existing dynamic inference techniques often generate spurious invariants that fit some program traces but do not generalize. With the insight that generating dynamic invariants is easy, we propose to verify these invariants statically using k-inductive SMT theorem proving which allows us to validate invariants that are not classically inductive. Results on difficult kernels involving nonlinear arithmetic and abstract arrays suggest that this hybrid approach efficiently generates and proves correct program invariants.
Preprint Available
Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development
Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, and Manuela Züger
University of Zurich, Switzerland; Microsoft Research, USA; Exponent, USA
Software developers make programming mistakes that cause serious bugs for their customers. Existing work to detect problematic software focuses mainly on post hoc identification of correlations between bug fixes and code. We propose a new approach to address this problem --- detect when software developers are experiencing difficulty while they work on their programming tasks, and stop them before they can introduce bugs into the code. In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult. We can predict nominal task difficulty (easy/difficult) for a new developer with 64.99% precision and 64.58% recall, and for a new task with 84.38% precision and 69.79% recall. We can improve the Naive Bayes classifier's performance if we trained it on just the eye-tracking data over the entire dataset, or by using a sliding window data collection schema with a 55 second time window. Our work brings the community closer to a viable and reliable measure of task difficulty that could power the next generation of programming support tools.
Preprint Available
Vejovis: Suggesting Fixes for JavaScript Faults
Frolin S. Ocariza, Jr., Karthik Pattabiraman, and Ali Mesbah
University of British Columbia, Canada
JavaScript is used in web applications for achieving rich user interfaces and implementing core functionality. Unfortunately, JavaScript code is known to be prone to faults. In an earlier study, we found that over 65% of such faults are caused by the interaction of JavaScript code with the DOM at runtime (DOM-related faults). In this paper, we first perform an analysis of 190 bug reports to understand fixes commonly applied by programmers to these DOM-related faults; we observe that parameter replacements and DOM element validations are common fix categories. Based on these findings, we propose an automated technique and tool, called Vejovis, for suggesting repairs for DOM-based JavaScript faults. To evaluate Vejovis, we conduct a case study in which we subject Vejovis to 22 real-world bugs across 11 applications. We find that Vejovis accurately suggests repairs for 20 out of the 22 bugs, and in 13 of the 20 cases, the correct fix was the top ranked one.
Preprint Available
Verifying Component and Connector Models against Crosscutting Structural Views
Shahar Maoz, Jan Oliver Ringert, and Bernhard Rumpe
Tel Aviv University, Israel; RWTH Aachen University, Germany
The structure of component and connector (C&C) models, which are used in many application domains of software engineering, consists of components at different containment levels, their typed input and output ports, and the connectors between them. C&C views, which we have presented at FSE'13, can be used to specify structural properties of C&C models in an expressive and intuitive way. In this work we address the verification of a C&C model against a C&C view and present efficient (polynomial) algorithms to decide satisfaction. A unique feature of our work, not present in existing approaches to checking structural properties of C&C models, is the generation of witnesses for satisfaction/non-satisfaction and of short natural-language texts, which serve to explain and formally justify the verification results and point the engineer to its causes. A prototype tool and an evaluation over four example systems with multiple views, performance and scalability experiments, as well as a user study of the usefulness of the witnesses for engineers, demonstrate the contribution of our work to the state-of-the-art in component and connector modeling and analysis.
Which Configuration Option Should I Change?
Sai Zhang and Michael D. Ernst
University of Washington, USA
Modern software often exposes configuration options that enable users to customize its behavior. During software evolution, developers may change how the configuration options behave. When upgrading to a new software version, users may need to re-configure the software by changing the values of certain configuration options. This paper addresses the following question during the evolution of a configurable software system: which configuration options should a user change to maintain the software's desired behavior? This paper presents a technique (and its tool implementation, called ConfSuggester) to troubleshoot configuration errors caused by software evolution. ConfSuggester uses dynamic profiling, execution trace comparison, and static analysis to link the undesired behavior to its root cause - a configuration option whose value can be changed to produce desired behavior from the new software version. We evaluated ConfSuggester on 8 configuration errors from 6 configurable software systems written in Java. For 6 errors, the rootcause configuration option was ConfSuggester's first suggestion. For 1 error, the root cause was ConfSuggester's third suggestion. The root cause of the remaining error was ConfSuggester's sixth suggestion. Overall, ConfSuggester produced significantly better results than two existing techniques. ConfSuggester runs in just a few minutes, making it an attractive alternative to manual debugging.
Preprint Available
A Candid Industrial Evaluation of Formal Software Verification using Model Checking
Matthew Bennion and Ibrahim Habli
Rolls Royce, UK; University of York, UK
Model checking is a powerful formal analytical approach to verifying software and hardware systems. However, general industrial adoption is far from widespread. Some difficulties include the inaccessibility of techniques and tools and the need for further empirical evaluation in industrial contexts. This study considers the use of Simulink Design Verifier, a model checker that forms part of a modelling system already widely used in the safety-critical industry. Model checking is applied to a number of real-world problem reports, associated with aero-engine monitoring functions, to determine whether it can provide a practical route into effective verification, particularly for non-specialists. The study also considers the extent to which model checking can satisfy the requirements of the extensive DO-178C guidance on formal methods. The study shows that the benefits of model checking can be realised in an industrial setting without specialist skills, particularly when it is targeted at parts of the software that are error-prone, difficult to verify conventionally or critical. Importantly, it shows that model checking can find errors earlier in the design cycle than testing, which potentially saves money, due to reduced scrap and rework.
Preprint Available
A Case Study on Testing, Commissioning, and Operation of Very-Large-Scale Software Systems
Michael Vierhauser, Rick Rabiser, and Paul Grünbacher
JKU Linz, Austria
An increasing number of software systems today are very-large-scale software systems (VLSS) with system-of-systems (SoS) architectures. Due to their heterogeneity and complexity VLSS are difficult to understand and analyze, which results in various challenges for development and evolution. Existing software engineering processes, methods, and tools do not sufficiently address the characteristics of VLSS. Also, there are only a few empirical studies on software engineering for VLSS. We report on results of an exploratory case study involving engineers and technical project managers of an industrial automation VLSS for metallurgical plants. The paper provides empirical evidence on how VLSS are tested, commissioned, and operated in practice. The paper discusses practical challenges and reports industrial requirements regarding process and tool support. In particular, software processes and tools need to provide general guidance at the VLSS level as well as specific methods and tools for systems that are part of the VLSS. Processes and tools need to support multi-disciplinary engineering across system boundaries. Furthermore, managing variability and evolution is success-critical in VLSS verification and validation.
A Systematic Approach to Transforming System Requirements into Model Checking Specifications
Daniel Aceituna, Hyunsook Do, and Sudarshan Srinivasan
North Dakota State University, USA
We propose a method that addresses the following dilemma: model checking can formally expose off-nominal behaviors and unintended scenarios in the requirements of concurrent reactive systems. Requirements engineers and non-technical stakeholders who are the system domain experts can greatly benefit from jointly using model checking during the elicitation, analysis, and verification of system requirements. However, model checking is formal verification and many requirements engineers and domain experts typically lack the knowledge and training needed to apply model checking to the formal verification of requirements. To get full advantages of model checking and domain experts’ knowledge in verifying the system, we proposed a front end framework to model checking and evaluated our approach using a real world application.
Active Files as a Measure of Software Maintainability
Lukas Schulte, Hitesh Sajnani, and Jacek Czerwonka
Northeastern University, USA; University of California at Irvine, USA; Microsoft, USA
In this paper, we explore the set of source files which are changed unusually often. We define these files as active files. Although discovery of active files relies only on version history and defect classification, the simple concept of active files can deliver key insights into software development activities. Active files can help focus code reviews, implement targeted testing, show areas for potential merge conflicts and identify areas that are central for program comprehension. In an empirical study of six large software systems within Microsoft ranging from products to services, we found that active files constitute only between 2-8% of the total system size, contribute 20-40% of system file changes, and are responsible for 60-90% of all defects. Not only this, but we establish that the majority, 65-95%, of the active files are architectural hub files which change due to feature addition as opposed to fixing defects.
An Empirical Study of Structural Defects in Industrial Use-Cases
Deepti Parachuri, A. S. M. Sajeev, and Rakesh Shukla
Infosys Labs, India; University of New England, Australia
Use-cases perform an important role in capturing and analys-ing software requirements in the IT industry. A number of guidelines have been proposed in the literature on how to write use-cases. Structural defects can occur when use-cases are written without following such guidelines. We develop a taxonomy of structural defects and analyse a sample of 360 industrial use-cases to understand the nature of defects in them. Our sample comes from both client-based projects and in-house projects. The results show that, compared to a sample of theoretical use-cases that follow Cockburn's guidelines, industrial use-cases on the average exhibit defects such as complex structures, lack of customer focus and missing actors. Given the shortage of analysis of real industry samples, our results make a significant contribution towards the understanding of the strengths and weaknesses in industrial use-cases in terms of structural defects. The results will be useful for industry practitioners in adopting use-case modelling standards to reduce the defects as well as for software engineering researchers to explore the reasons for such differences between the theory and the practice in use-case modelling.
Analyzing Software Data: After the Gold Rush (A Goldfish-Bowl Panel)
Tim Menzies, Christian Bird, and Thomas Zimmermann
West Virginia University, USA; Microsoft Research, USA
Over the past few years, the volume and types of data related to software engineering has grown at an unprecedented rate and shows no sign of slowing. This turn of events has led to a veritable gold rush, as researchers attempt to mine raw data and extract nuggets of insight. A very real danger is that the landscape may become a Wild West where inexperienced software "cowboys" sell hastily generated models to unsophisticated business users, without any concern for best or safe practices. Given the current enthusiasm for data analysis in software engineering, it is time to review how we using those techniques and can we use them better. While there may be no single best "right" way to analyze software data, there are many wrong ways. As data techniques mature, we need to move to a new era where data scientists understand and share the strengths and drawbacks of the many methods that might be deployed in industry. In this highly interactive panel skilled practitioners and academics can (a) broadcast their insights and (b) hear the issues of newcomers in this field.
Architectural Dependency Analysis to Understand Rework Costs for Safety-Critical Systems
Robert L. Nord, Ipek Ozkaya, Raghvinder S. Sangwan, and Ronald J. Koontz
SEI, USA; Pennsylvania State University, USA; Boeing, USA
To minimize testing and technology upgrade costs for safety-critical systems, a thorough understanding and analysis of architectural dependencies is essential. Unmanaged dependencies create cost overruns and degraded qualities in systems. Architecture dependency analysis in practice, however, is typically performed in retrospect using code structures, the runtime image of a system, or both. Retrospective analysis can miss important dependencies that surface earlier in the life cycle. Development artifacts such as the software architecture description and the software requirements specification can augment the analysis process; however, the quality, consistency, and content of these artifacts vary widely. In this paper, we apply a commonly used dependency analysis metric, stability, and a visualization technique, the dependency structure matrix, to an architecture common to safety-critical systems that was re-engineered to reduce safety testing and upgrade cost. We describe the gaps observed when running the analysis and discuss the need for early life-cycle dependency analysis for managing rework costs in industrial software development environments.
Assessing Model-Based Testing: An Empirical Study Conducted in Industry
Christoph Schulze, Dharmalingam Ganesan, Mikael Lindvall, Rance Cleaveland, and Daniel Goldman
Fraunhofer CESE, USA; Global Net Services, USA
We compare manual testing without any automation performed by a tester at a software company with model-based testing (MBT) performed by a tester at a research center. The system under test (SUT), of which two different versions were tested by each of the two testers, is a professionally developed web-based data collection system that now is in use. The two testers tested the same versions, had identical testing goals (to detect defects), had access to the same resources, but used different processes (i.e. manual without any automation vs. model-based with automatic test case generation and automatic test case execution). The testers did not interact with each other. We compare the effectiveness (issues found) and efficiency (effort spent) of the two approaches. The results show, for example, that manual testing required less preparation time and that its test coverage was somewhat uneven. In contrast, MBT required more preparation time, was more systematic, and detected more issues. While the manual approach detected more inconsistencies between specified and actual text labels, MBT detected more functional issues. This is reflected in the severity score summary which was about 60% higher for MBT than Manual.
Automated Software Integration Flows in Industry: A Multiple-Case Study
Daniel Ståhl and Jan Bosch
Ericsson, Sweden; Chalmers, Sweden
There is a steadily increasing interest in the agile practice of continuous integration. Consequently, there is great diversity in how it is interpreted and implemented, and a need to study, document and analyze how automated software integration flows are implemented in the industry today. In this paper we study five separate cases, using a descriptive model developed to address the variation points in continuous integration practice discovered in literature. Each case is discussed and evaluated individually, whereupon six guidelines for the design and implementation of automated software integration are presented. Furthermore, the descriptive model used to document the cases is evaluated and evolved.
Characterization of Operational Failures from a Business Data Processing SaaS Platform
Catello Di Martino, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Geetika Goel, Santonu Sarkar, and Rajeshwari Ganesan
University of Illinois at Urbana-Champaign, USA; Infosys Labs, India
This paper characterizes operational failures of a production Custom Package Good Software-as-a-Service (SaaS) platform. Events log collected over 283 days of in-field operation are used to characterize platform failures. The characterization is performed by estimating (i) common failure types of the platform, (ii) key factors impacting platform failures, (iii) failure rate, and (iv) how user workload (files submitted for processing) impacts on the failure rate. The major findings are: (i) 34.1% of failures are caused by unexpected values in customers' data, (ii) nearly 33% of the failures are because of timeout, and (iii) the failure rate increases if the workload intensity (transactions/second) increases, while there is no statistical evidence of being influenced by the workload volume (size of users' data). Finally, the paper presents the lessons learned and how the findings and the implemented analysis tool allow platform developers to improve platform code, system settings and customer management.
Preprint Available
Collecting and Leveraging a Benchmark of Build System Clones to Aid in Quality Assessments
Shane McIntosh, Martin Poehlmann, Elmar Juergens, Audris Mockus, Bram Adams, Ahmed E. Hassan, Brigitte Haupt, and Christian Wagner
Queen's University, Canada; CQSE, Germany; Avaya Labs Research, USA; Polytechnique Montréal, Canada; Munich Re, Germany
Build systems specify how sources are transformed into deliverables, and hence must be carefully maintained to ensure that deliverables are assembled correctly. Similar to source code, build systems tend to grow in complexity unless specifications are refactored. This paper describes how clone detection can aid in quality assessments that determine if and where build refactoring effort should be applied. We gauge cloning rates in build systems by collecting and analyzing a benchmark comprising 3,872 build systems. Analysis of the benchmark reveals that: (1) build systems tend to have higher cloning rates than other software artifacts, (2) recent build technologies tend to be more prone to cloning, especially of configuration details like API dependencies, than older technologies, and (3) build systems that have fewer clones achieve higher levels of reuse via mechanisms not offered by build technologies. Our findings aided in refactoring a large industrial build system containing 1.1 million lines.
Preprint Available
Configurations Everywhere: Implications for Testing and Debugging in Practice
Dongpu Jin, Xiao Qu, Myra B. Cohen, and Brian Robinson
University of Nebraska-Lincoln, USA; ABB Research, USA; ABB, USA
Best SEIP Paper
Many industrial systems are highly-configurable, complicating the testing and debugging process. While researchers have developed techniques to statically extract, quantify and manipulate the valid system configurations, we conjecture that many of these techniques will fail in practice. In this paper we analyze a highly-configurable industrial application and two open source applications in order to quantify the true challenges that configurability creates for software testing and debugging. We find that (1) all three applications consist of multiple programming languages, hence static analyses need to cross programming language barriers to work, (2) there are many access points and methods to modify configurations, implying that practitioners need configuration traceability and should gather and merge metadata from more than one source and (3) the configuration state of an application on failure cannot be reliably determined by reading persistent data; a runtime memory dump or other heuristics must be used for accurate debugging. We conclude with a roadmap and lessons learned to help practitioners better handle configurability now, and that may lead to new configuration-aware testing and debugging techniques in the future.
Preprint Available
Deriving Requirements Model from Textual Use Cases
Kiran Prakash Sawant, Suman Roy, Srivibha Sripathi, François Plesse, and A. S. M. Sajeev
Infosys Labs, India; MINES ParisTech, France; University of New England, Australia
In this paper, we present an approach to derive structured requirements models from textual use case requirements in the form of process diagrams and ontology using methods based on computational linguistics. These proposed requirements models are capable of modeling both structural and behavioral entities present in a use case. We consider a corpus containing 123 actual requirements use cases created by Infosys Ltd. and translated them to process diagrams and ontology. For evaluating the performance of conversion we propose a few metrics and show that on average our linguistic engine mis-identied 2% of actions and missed out only 3% of the actions described in the input text.
Distributed-Pair Programming Can Work Well and Is Not Just Distributed Pair-Programming
Julia Schenk, Lutz Prechelt, and Stephan Salinger
Freie Universität Berlin, Germany
Background: Distributed Pair Programming can be performed via screensharing or via a distributed IDE. The latter offers the freedom of concurrent editing (which may be helpful or damaging) and has even more awareness deficits than screen sharing. Objective: Characterize how competent distributed pair programmers may handle this additional freedom and these additional awareness deficits and characterize the impacts on the pair programming process. Method: A revelatory case study, based on direct observation of a single, highly competent distributed pair of industrial software developers during a 3-day collaboration. We use recordings of these sessions and conceptualize the phenomena seen. Results: 1.Skilled pairs may bridge the awareness deficits without visible obstruction of the overall process. 2.Skilled pairs may use the additional editing freedom in a useful limited fashion, resulting in potentially better fluency of the process than local pair programming. Conclusion: When applied skillfully in an appropriate context, distributed-pair programming can (not will!) work at least as well as local pair programming.
Empirical Insights into the Perceived Benefits of Agile Software Engineering Practices: A Case Study from SAP
Christoph Tobias Schmidt, Srinivasa Ganesha Venkatesha, and Juergen Heymann
University of Mannheim, Germany; SAP Labs, India; SAP, Germany
SAP AG has taught more than 4,000 developers the Agile Software Engineering methodology since 2010. As such, the company offers a unique setting to study its impact on developers' work. In this paper, we discuss how developers perceive the impact of pair programming and test automation on software quality, delivered feature scope, and various team work aspects. We draw our findings from a company-wide survey with answers from 174 developers working in 74 teams and 15 product owners from five locations world-wide. We complement our findings with insights from two in-depths case studies with two development teams. As expected, our findings confirm that the studied practices help developers develop better software. Deviating from existing preconceptions, however, the responding developers do not report a significant drop in their development speed. In addition, high adopters are more proud of their own contributions to the team, report a better team learning, and feel more motivated.
Preprint Available
Evidence-Based Decision Making in Lean Software Project Management
Brian Fitzgerald, Mariusz Musiał, and Klaas-Jan Stol
Lero, Ireland; University of Limerick, Ireland; Ericpol, Poland
Many professions evolve from their origins as a creative craft process to a more product-centered industrial process. Software development is on such an evolutionary trajectory. A major step in this evolution is the progression from ad hoc to more rigorous evidence-based decision-making in software development project management. This paper extends theory and practice in relation to lean software development using such an evidence-based approach. Based on a comprehensive dataset of software development metrics, gathered in a longitudinal case study over a 22-month period, the Erlang-C model is used to analyze different software development parameters and to guide management decision-making in relation to development resources. The analysis reveals how `gut-feel' and intuition can be replaced by evidence-based decision-making, and reveals how incorrect assumptions can underpin decisions which as a consequence do not achieve the desired outcome.
Preprint Available
Experiences Gamifying Developer Adoption of Practices and Tools
Will Snipes, Anil R. Nair, and Emerson Murphy-Hill
ABB Research, USA; ABB Research, India; North Carolina State University, USA
As software development practices evolve, toolsmiths face the continuous challenge of getting developers to adopt new practices and tools. We tested an idea with industrial software developers that adding game-like feedback to the development environment would improve adoption of tools and practices for code navigation. We present results from a pre-study survey of 130 developers' opinions on gamification and motivation, usage data from a study with an intact team of six developers of a game on code navigation practices, and feedback collected in post-study interviews. Our pre-study survey showed that most developers were interested in gamification, though some have strong negative opinions. Study results show that two of the six study developers adjusted their practices when presented with competitive game elements.
Extrinsic Influence Factors in Software Reliability: A Study of 200,000 Windows Machines
Christian Bird, Venkatesh-Prasad Ranganath, Thomas Zimmermann, Nachiappan Nagappan, and Andreas Zeller
Microsoft Research, USA; Kansas State University, USA; Microsoft, USA; Saarland University, Germany
Reliability of software depends not only on intrinsic factors such as its code properties, but also on extrinsic factors—that is, the properties of the environment it operates in. In an empirical study of more than 200,000 Windows users, we found that the reliability of individual applications is related to whether and which other applications are in-stalled: While games and file-sharing applications tend to decrease the reliability of other applications, security applications tend to increase it. Furthermore, application reliability is related to the usage profiles of these applications; generally, the more an application is used, the more likely it is to have negative impact on reliability of others. As a conse-quence, software testers must be careful to investigate and control these factors.
Preprint Available
How to Build a Good Practice Software Project Portfolio?
Hennie Huijgens, Rini van Solingen, and Arie van Deursen
Delft University of Technology, Netherlands; Goverdson, Netherlands; Prowareness, Netherlands
What can we learn from historic data that is collected in three software companies that on a daily basis had to cope with highly complex project portfolios? In this paper we analyze a large dataset, containing 352 finalized software engineering projects, with the goal to discover what factors affect software project performance, and what actions can be taken to increase project performance when building a software project portfolio. The software projects were classified in four quadrants of a Cost/Duration matrix: analysis was performed on factors that were strongly related to two of those quadrants, Good Practices and Bad Practices. A ranking was performed on the factors based on statistical significance. The paper results in an inventory of ‘what factors should be embraced when building a project portfolio?’ (Success Factors), and ‘what factors should be avoided when doing so?’ (Failure Factors). The major contribution of this paper is that it analyzes characteristics of best performers and worst performers in the dataset of software projects, resulting in 7 Success Factors (a.o. steady heartbeat, a fixed, experienced team, agile (Scrum), and release-based), and 9 Failure Factors (a.o. once-only project, dependencies with other systems, technology driven, and rules- and regulations driven).
Improving Software through Automatic Untangling of Cyclic Dependencies
Maayan Goldstein and Dany Moshkovich
IBM Research, Israel
Cyclic dependencies among software components are considered an architectural problem that increases the development time and prevents proper reuse. One cause for the existence of such dependencies is the improper organization of elements into components. Optimal reorganization of the components that resolves the cyclic dependencies in large and complex software systems is extremely difficult to perform manually and is not computationally feasible to perform automatically. We present an approach for automatic untangling of cyclic dependencies among components for cycles of any size, having direct or transitive dependencies on one another. Our approach aims at minimizing the modifications to the original structure of the system, while taking into account various architectural properties. We evaluate our solution on twelve open source and three industrial applications. We demonstrate its applicability and value through architectural metrics and feedback from system architects.
Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs
Tian Xiao, Jiaxing Zhang, Hucheng Zhou, Zhenyu Guo, Sean McDirmid, Wei Lin, Wenguang Chen, and Lidong Zhou
Tsinghua University, China; Microsoft Research, China; Microsoft Bing, USA
The simplicity of MapReduce introduces unique subtleties that cause hard-to-detect bugs; in particular, the unfixed order of reduce function input is a source of nondeterminism that is harmful if the reduce function is not commutative and sensitive to input order. Our extensive study of production MapReduce programs reveals interesting findings on commutativity, nondeterminism, and correctness. Although non-commutative reduce functions lead to five bugs in our sample of well-tested production programs, we surprisingly have found that many non-commutative reduce functions are mostly harmless due to, for example, implicit data properties. These findings are instrumental in advancing our understanding of MapReduce program correctness.
Objective Safety Compliance Checks for Source Code
Alois Mayr, Reinhold Plösch, and Matthias Saft
JKU Linz, Austria; Siemens, Germany
Safety standards such as IEC 61508 are an important instrument for developing safety-critical systems. They provide requirements and recommendations to assist engineers in system and software development. Nevertheless, applying this standard in practice is difficult due to unclear requirements and unclear or missing acceptance criteria. We systematically developed a quality model including proper measurement support that covers the code-related parts of IEC 61508 in [20]. In this paper, we present the assessment approach for automatic compliance checks of the code-related parts of the standard. We find in a validation study that the assessment results obtained by applying this approach to real-world projects are in line with their externally granted certification. The results are valid for the vast majority of the modeled elements of the standard. Moreover, by drilling down into the assessment results, we are able to detect deficiencies in the certified real-world projects.
Ready-Set-Transfer: Exploring the Technology Transfer Readiness of Academic Research Projects (Panel)
Jane Cleland-Huang, Daniela Damian, and Smita Ghaisas
DePaul University, USA; University of Victoria, Canada; Tata Consultancy Services, India
Software engineering research is undertaken to propose innovative solutions, to develop concepts, algorithms, processes, and technologies, to validate effective solutions for important software engineering problems, and ultimately to support the transition of important findings to practice. However prior studies have shown that successful projects often take from 20-25 years to reach the stage of full industry adoption, while many other projects fizzle out and never advance beyond the initial research phase. This panel provides the opportunity for practitioners and academics to engage in a meaningful discussion around the topic of technology transfer. In this fourth offering of the Ready-Set-Transfer panel, three research groups will present products that they believe to be industry-ready to a panel of industrial practitioners. Each team will receive feedback from the panelists. The long-term goal of the panel is to increase technology transfer in the software engineering domain.
Software Engineering for the Web: The State of the Practice
Alex Nederlof, Ali Mesbah, and Arie van Deursen
Delft University of Technology, Netherlands; University of British Columbia, Canada
Today’s web applications increasingly rely on client-side code execution. HTML is not just created on the server, but manipulated extensively within the browser through JavaScript code. In this paper, we seek to understand the software engineering implications of this. We look at deviations from many known best practices in such areas of performance, accessibility, and correct structuring of HTML documents. Furthermore, we assess to what extent such deviations manifest themselves through client-side code manipulation only. To answer these questions, we conducted a large scale experiment, involving automated client-enabled crawling of over 4000 web applications, resulting in over 100,000,000 pages analyzed, and close to 1,000,000 unique client-side user interface states. Our findings show that the majority of sites contain a substantial number of problems, making sites unnecessarily slow, inaccessible for the visually impaired, and with layout that is unpredictable due to errors in the dynamically modified DOM trees.
Software Feature Location in Practice: Debugging Aircraft Simulation Systems
Salman Hoseini, Abdelwahab Hamou-Lhadj, Patrick Desrosiers, and Martin Tapp
Concordia University, Canada; CAE, Canada
In this paper, we report on a study that we have conducted at CAE, one of the largest civil aircraft simulation companies in the world, in which we have developed a feature location approach to help software engineers debug simulation scenarios. A simulation scenario consists of a set of software components, configured in a certain way. A simulation fails when it does not behave as intended. This is typically a sign of a configuration problem. To detect configuration errors, we propose FELODE (Feature Location for Debugging), an approach that uses a single trace combined with user queries. When applied to CAE systems, FELODE achieves in average a precision of 50% and a recall of up to 100%.
Tracking Requirements Evolution by Using Issue Tickets: A Case Study of a Document Management and Approval System
Shinobu Saito, Yukako Iimura, Kenji Takahashi, Aaron K. Massey, and Annie I. Antón
NTT DATA, Japan; NTT, Japan; NTT, USA; Georgia Tech, USA
Requirements evolve throughout the software life-cycle. When requirements change, requirements engineers must determine what software artifacts could be affected. The history of and rationale for requirements evolution provides engineers some information about artifact dependencies for impact analysis. In this paper, we discuss a case study of requirements evolution for a large-scale system governed by Japanese laws and regulations. We track requirements evolution using issue tickets created in response to stakeholder requests. We provide rules to identify requirements evolution events (e.g. refine, decompose, and replace) from combinations of operations (e.g. add, change, and delete) specified in the issue tickets. We propose a Requirements Evolution Chart (REC) to visually represent requirements evolution as a series of events over time, and implement tool support to generate a REC from a series of issue tickets using our rules to identify requirements evolution events. We found that the REC supports impact analysis and compliance efforts.
Where Do Developers Log? An Empirical Study on Logging Practices in Industry
Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie
Microsoft Research, China; Chinese University of Hong Kong, China; Carnegie Mellon University, USA; University of Illinois at Urbana-Champaign, USA
System logs are widely used in various tasks of software system management. It is crucial to avoid logging too little or too much. To achieve so, developers need to make informed decisions on where to log and what to log in their logging practices during development. However, there exists no work on studying such logging practices in industry or helping developers make informed decisions. To fill this significant gap, in this paper, we systematically study the logging practices of developers in industry, with focus on where developers log. We obtain six valuable findings by conducting source code analysis on two large industrial systems (2.5M and 10.4M LOC, respectively) at Microsoft. We further validate these findings via a questionnaire survey with 54 experienced developers in Microsoft. In addition, our study demonstrates the high accuracy of up to 90% F-Score in predicting where to log.
A Compiler Project with Learning Progressions
Derek Rayside
University of Waterloo, Canada
We describe the design of an undergraduate compilers course that is explicitly intended to teach software engineering concepts and skills in addition to compiler concepts. This objective is accomplished by structuring the course around two parallel learning progressions rather than around the logical structure of a compiler. The nominal purpose of the project is to develop a simulator and synthesizer for simple circuits written in a subset of VHDL. This subset of VHDL is translated to a simple LL(1) boolean formula language. The circuit simulator reads and writes binary waveforms according to a regular grammar. The students start working with the simple waveform language and work their way up to the subset of VHDL. As the complexity of the input language and transformations increases, new software engineering concepts are introduced to help manage that complexity. At the end of the project the students can simulate and synthesize simple circuits such as a ripple-carry adder or a multiplexer.
An Inverted Classroom Experience: Engaging Students in Architectural Thinking for Agile Projects
Jane Cleland-Huang, Muhammad Ali Babar, and Mehdi Mirakhorli
DePaul University, USA; University of Adelaide, Australia
This case study presents our experiences using architecturally savvy personnas in the classroom. The personas were used to help students analyze and prioritize architecturally significant requirements, and then to drive and evaluate architectural design. The activity was designed to equip students with a technique for integrating architectural thinking into the agile development process. We describe our learning goals and the activity, discuss student learning outcomes,lessons learned from running the activity, and propose an improved structuring. All materials including training videos, hand outs, and instructions are available online at http://re.cs.depaul.edu/pedagogy/ASP.
Comparing Test Quality Measures for Assessing Student-Written Tests
Stephen H. Edwards and Zalia Shams
Virginia Tech, USA
Many educators now include software testing activities in programming assignments, so there is a growing demand for appropriate methods of assessing the quality of student-written software tests. While tests can be hand-graded, some educators also use objective performance metrics to assess software tests. The most common measures used at present are code coverage measures—tracking how much of the student’s code (in terms of statements, branches, or some combination) is exercised by the corresponding software tests. Code coverage has limitations, however, and sometimes it overestimates the true quality of the tests. Some researchers have suggested that mutation analysis may provide a better indication of test quality, while some educators have experimented with simply running every student’s test suite against every other student’s program—an “all-pairs” strategy that gives a bit more insight into the quality of the tests. However, it is still unknown which one of these measures is more accurate, in terms of most closely predicting the true bug revealing capability of a given test suite. This paper directly compares all three methods of measuring test quality in terms of how well they predict the observed bug revealing capabilities of student-written tests when run against a naturally occurring collection of student-produced defects. Experimental results show that all-pairs testing—running each student’s tests against every other student’s solution—is the most effective predictor of the underlying bug revealing capability of a test suite. Further, no strong correlation was found between bug revealing capability and either code coverage or mutation analysis scores.
Deploying an Online Software Engineering Education Program in a Globally Distributed Organization
John Hudepohl, Alpana Dubey, Sylvie Moisy, Jessica Thompson, and Hans-Martin Niederer
ABB, Switzerland; ABB, India; ABB, France; TimelyText, USA; SynSpace, Switzerland
A well-trained software engineering workforce is a key to success in a highly competitive environment. Changing tools and technologies, along with a rapidly changing development environment, make it incumbent on organizations to invest in training. In this paper, we describe our experience in deploying an online training program in a globally distributed organization. We write about the reasons behind ABB’s Software Development Improvement Program (SDIP), the requirements we established upfront, the people, processes and technologies we used, the promotion of SDIP, and metrics for measuring success. Finally, we share and describe results and lessons learned that could be applied to many organizations with similar issues. The goal of this paper is to provide a set of replicable best practices for initiating a software training program in a multi-national organization. The first SDIP online course was offered in June 2012. Since then, we have had more than 10,000 enrollments from employees in 54 countries. Today, our training library contains 89 e-learning, 17 webinar, video and virtual lab courses, and we have delivered more than 180 hosted webinars. Following each class, we ask students to evaluate the class. Ninety-eight percent are satisfied with the classes.
Functional Programming For All! Scaling a MOOC for Students and Professionals Alike
Heather Miller, Philipp Haller, Lukas Rytz, and Martin Odersky
EPFL, Switzerland; Typesafe, Switzerland
Massive open online courses (MOOCs) have launched a scale shift in higher education, with several individual MOOCs now boasting tens or hundreds of thousands of participants worldwide. Our MOOC on the principles of functional programming has more than 100,000 registered students to date, and boasts one of the highest rates of completion (19.2%) for its size. In this paper, we describe our experience organizing this popular MOOC, and demonstrate how providing innovative supporting tools (IDE plugins, testing frameworks, interactive build tools, automated cloud-based graders, style checkers) and considering key human-computer interaction factors potentially contributed to this markedly high completion rate. We collect an unprecedented volume of course statistics and survey results and have made them available, along with scripts for generating interactive web-based visualizations, as an open-source project.
Introduction of Continuous Delivery in Multi-Customer Project Courses
Stephan Krusche and Lukas Alperowitz
TU München, Germany
Continuous delivery is a set of practices and principles to release software faster and more frequently. While it helps to bridge the gap between developers and operations for software in production, it can also improve the communication between developers and customers in the development phase, i.e. before software is in production. It shortens the feedback cycle and developers ideally use it right from the beginning of a software development project. In this paper we describe the implementation of a customized continuous delivery workflow and its benefits in a multi-customer project course in summer 2013. Our workflow focuses on the ability to deliver software with only a few clicks to the customer in order to obtain feedback as early as possible. This helps developers to validate their understanding about requirements, which is especially helpful in agile projects where requirements might change often. We describe how we integrated this workflow and the role of the release manager into our project-based organization and how we introduced it using different teaching methods. Within three months 90 students worked in 10 different projects with real customers from industry and delivered 490 releases. After the project course we evaluated our approach in an online questionnaire and in personal interviews. Our findings and observations show that participating students understood and applied the concepts and are convinced about the benefits of continuous delivery.
Investigating the Skill Gap between Graduating Students and Industry Expectations
Alex Radermacher, Gursimran Walia, and Dean Knudson
North Dakota State University, USA
Graduating computer science and software engineering students do not always possess the necessary skills, abilities, or knowledge when beginning their careers in the software industry. The lack of these skills and abilities can limit the productivity of newly hired, recent graduates, or even prevent them from gaining employment. This paper presents the results of an empirical study where twenty-three managers and hiring personnel from various software companies in the United States and Europe were interviewed. Participants were asked about areas where recent graduates frequently struggled when beginning employment at their companies and which skill deficiencies might prevent a recent graduate from being hired. The results of this study indicate that recent graduates struggle with using configuration management systems (and other software tools), effectively communicating with co-workers and customers, producing unit tests for their code, and other skills or abilities. The results also indicate that a lack of project experience and problem solving abilities are the most commonly cited issues preventing students from gaining employment. This research is intended to assist educators in identifying areas where students may not measure up the expectations of industry companies and in improving the curriculum at their universities to better prepare them for their future careers.
Knowledge Transfer in Collaborative Teams: Experiences from a Two-Week Code Camp
Terhi Kilamo, Antti Nieminen, Janne Lautamäki, Timo Aho, Johannes Koskinen, Jarmo Palviainen, and Tommi Mikkonen
Tampere University of Technology, Finland
Software engineering has both technological and social dimensions. As development teams spanning across the globe are increasingly the norm and while the web enables massive online collaboration, there is a growing need for effective collaboration tools. In this paper, we describe experiences on collaborative programming as a tool for learning software development. To investigate the nature of collaboration in software engineering education, we arranged a two-week-long course experiment where students used a collaborative online integrated development environment to create different kinds of web services. We present lessons learned from the experiment and discuss how collaboration can act as a tool for knowledge transfer among learners.
Lessons Learned Managing Distributed Software Engineering Courses
Reid Holmes, Michelle Craig, Karen Reid, and Eleni Stroulia
University of Waterloo, Canada; University of Toronto, Canada; University of Alberta, Canada
We have run the Undergraduate Capstone Open Source Projects (UCOSP) program for ten terms over the past six years providing over 400 Canadian students from more than 30 schools the opportunity to be members of distributed software teams. UCOSP aims to provide students with real development experience enabling them to integrate lessons they have learned in the classroom with practical development experience while developing their technical communication skills. The UCOSP program has evolved over time as we have learned how to effectively manage a diverse set of students working on a large number of different projects. The goal of this paper is to provide an overview of the roles of the various stakeholders for distributed software engineering projects and the various lessons we have learned to make UCOSP an effective and positive learning experience.
Preprint Available
Process Mining Software Repositories from Student Projects in an Undergraduate Software Engineering Course
Megha Mittal and Ashish Sureka
IIIT Delhi, India
An undergraduate level Software Engineering courses generally consists of a team-based semester long project and emphasizes on both technical and managerial skills. Software Engineering is a practice-oriented and applied discipline and hence there is an emphasis on hands-on development, process, usage of tools in addition to theory and basic concepts. We present an approach for mining the process data (process mining) from software repositories archiving data generated as a result of constructing software by student teams in an educational setting. We present an application of mining three software repositories: team wiki (used during requirement engineering), version control system (development and maintenance) and issue tracking system (corrective and adaptive maintenance) in the context of an undergraduate Software Engineering course. We propose visualizations, metrics and algorithms to provide an insight into practices and procedures followed during various phases of a software development life-cycle. The proposed visualizations and metrics (learning analytics) provide a multi-faceted view to the instructor serving as a feedback tool on development process and quality by students. We mine the event logs produced by software repositories and derive insights such as degree of individual contributions in a team, quality of commit messages, intensity and consistency of commit activities, bug fixing process trend and quality, component and developer entropy, process compliance and verification. We present our empirical analysis on a software repository dataset consisting of 19 teams of 5 members each and discuss challenges, limitations and recommendations.
Quantitative Assessment with Using Ticket Driven Development for Teaching Scrum Framework
Hiroshi Igaki, Naoki Fukuyasu, Sachio Saiki, Shinsuke Matsumoto, and Shinji Kusumoto
Osaka University, Japan; Wakayama University, Japan; Kobe University, Japan
Developing software by teams which adopted the agile development methodology such as Scrum seems totally natural in industry. On the other hand, students belonging to graduate schools of information science who have some experience on the agile team software development are rare. In the initial education on the Scrum, there exists some challenges. The first one is a concept of self-organization. In the Scrum project, members of the team determine how best to accomplish the tasks by themselves. However, it is challenging for students with less experience on team software development to cultivate the self-organizing team by themselves. The 2nd one is inequality on task assignment. In the Scrum project, each member pulls tasks to perform, and do not wait to be assigned by Project Manager. The assignment behavior may cause inequality on task assignment. As a result, such inequality may also cause inequality on learning opportunities to acquire skills and knowledge on the process and the product. In this paper, we propose quantitative assessment methods for the Scrum project with using TiDD(Ticket Driven Development) for the initial education on the Scrum framework and web application production. We report on our basic PBL(Project-Based Learning) course which involved 49 students. The use of quantitative criteria enabled students and teachers to assess the Scrum projects in the viewpoints of quality, assignment, and delivery of projects.
Quasi-Crowdsourcing Testing for Educational Projects
Zhenyu Chen and Bin Luo
Nanjing University, China
The idea of crowdsourcing tasks in software engineering, especially software testing, has gained popularity in recent years. Crowdsourcing testing and educational projects are natural complementary. One of the challenges of crowdsourcing testing is to find a number of qualified workers with low cost. Students in software engineering are suitable candidates for crowdsourcing testing. On the other hand, practical projects play a key role in software engineering education. In order to enhance educational project outcomes and achieve industrial-strength training, we need to provide the opportunity for students to be exposed to commercial software development. In this paper, we report a preliminary study on crowdsourcing testing for educational projects. We introduce three commercial software products as educational testing projects, which are crowdsourced by our teaching support system. We call this "Quasi-Crowdsourcing Test" (QCT) because the candidate workers are students, who have certain social relations. The investigation results are encouraging and show to be beneficial to both the students and industry in QCT projects.
Preprint Available
Scenario-Based Programming: Reducing the Cognitive Load, Fostering Abstract Thinking
Giora Alexandron, Michal Armoni, Michal Gordon, and David Harel
Weizmann Institute of Science, Israel
We examine how students work in scenario-based and object- oriented programming (OOP) languages, and qualitatively analyze the use of abstraction through the prism of the dif- ferences between the paradigms. The findings indicate that when working in a scenario-based language, programmers think on a higher level of abstraction than when working with OOP languages. This is explained by other findings, which suggest how the declarative, incremental nature of scenario-based programming facilitates separation of con- cerns, and how it supports a kind of programming that al- lows programmers to work with a less detailed mental model of the system they develop. The findings shed light on how declarative approaches can reduce the cognitive load involved in programming, and how scenario-based program- ming might solve some of the difficulties involved in the use of declarative languages. This is applicable to the design of learning materials, and to the design of programming lan- guages and tools.
State-Based Monitoring and Goal-Driven Project Steering: Field Study of the SEMAT Essence Framework
Cécile Péraire and Todd Sedano
Carnegie Mellon University, USA
At Carnegie Mellon University in Silicon Valley, the graduate master program ends with a practicum project during which students serve as software engineering consultants for an industry client. In this context, students are challenged to demonstrate their ability to work on self-managing and self-organizing teams. This paper presents a field study of the Software Engineering Method and Theory (SEMAT) Essence framework. The objective is to evaluate the effectiveness of the Essence’s novel state-based monitoring and goal-driven steering approach provided by the Essence kernel alphas and their states. The researchers conducted the study on seven graduate master student teams applying the approach throughout their practicum projects. The research methodology involves weekly observation and recording of each team’s state progression and collecting students’ reflection on the application of the approach. The main result validates that the approach provides student teams with a holistic, lightweight, non-prescriptive and method-agnostic way to monitor progress and steer projects, as well as an effective structure for team reflection and risk management. The paper also validates that the Essence kernel provides an effective mechanism for monitoring and steering work common to most student software projects. This includes the work done during project initiation as well as the work done at the project or release level. Support for technical work should come from additional practices added on top of the kernel, or by extending or altering the kernel definition. The conclusion is that the approach enables students to learn to steer projects effectively by addressing the various dimensions of software engineering. Hence the approach could be leveraged in software engineering education.
Teaching Reuse-Driven Software Engineering through Innovative Role Playing
Gerald Kotonya and Jaejoon Lee
Lancaster University, UK
Reuse-Driven Software Engineering (RDSE) represents a development paradigm that promises to shorten development cycles and cut the costs associated with custom development by assembling systems from pre-existing software components and services. However, like most approaches that hold the promise of improving software engineering, the success of RDSE is dependent on skilled staff. This means that software engineering education remains the most effective vehicle to the community for reuse-driven technology transfer available. However, the teaching of RDSE poses many challenges to software engineering educators. These difficulties include how to make the benefits of RDSE visible to students and how to establish an acceptable balance between engineering principle and the software practice embodied RDSE. This paper describes a novel approach to teaching RDSE at Lancaster University, UK.
Teaching Students Scrum using LEGO Blocks
Maria Paasivaara, Ville Heikkilä, Casper Lassenius, and Towo Toivola
Aalto University, Finland; F-Secure, Finland
In this paper, we present a LEGO-based Scrum simulation game that we used twice with Master’s level students at Aalto University. The game was initially developed as an internal training tool in F-Secure Corporation, a Finnish security software company, to support their agile adoption. In the game, student teams learn the Scrum roles, events and concepts in practice by simulating several development Sprints, while incrementally planning and building a product of LEGO blocks. Student satisfaction was measured by a survey at the end of the course, and student learning evalu- ated by learning diaries. Our results show that the students were highly satisfied with the game, and that students with various degrees of experience with Scrum all learned a lot. In particular, students reported gaining insights about require- ments management and customer collaboration, effective teamwork, and the Scrum roles.
Teaching Students to Understand Large Programs by Understanding Historical Context
Collin McMillan and Richard Oosterhoff
University of Notre Dame, USA
Program comprehension is one of the most important challenges that new software developers face. Educators have sought to prepare students for this challenge through hands-on software development projects. These projects teach students effective software engineering principles. But, students often struggle to see the value of these principles in class projects, and therefore struggle to recognize them outside the classroom. The inevitable result is that these students have difficulty comprehending large programs after graduation. In this paper, we argue that a remedy to this problem is to teach the history of how software development principles were created. In this collaborative work with the Notre Dame Department of History, we present a course that blends a discussion of this history with a hands-on software project. We present a summary of the history covered in our course, and reflect on our teaching experience.
Towards a Supercollaborative Software Engineering MOOC
William Billingsley and Jim R. H. Steel
NICTA, Australia; University of Queensland, Australia
Recently there has been rapid growth in the number of online courses and venues through which students can learn introductory computer programming. As software engineering education becomes more prevalent online, online education will need to address how to give students the skills and experience at programming collaboratively on realistic projects. In this paper, we analyse factors affecting how a supercollaborative on-campus software studio course could be adapted as a project-led supercollaborative MOOC.
Preprint Available
Additional Information
Using MOOCs to Reinvigorate Software Engineering Education (Keynote)
Armando Fox
University of California at Berkeley, USA
The spectacular failure of the Affordable Care Act website ("Obamacare") has focused public attention on software engineering. Yet experienced practitioners mostly sighed and shrugged, because the historical record shows that only 10 % of large (> 10 M) software projects using conventional methodologies such as Waterfall are successful. In contrast, Amazon and others successfully build comparably large and complex sites with hundreds of integrated subsystems by using modern agile methods and service-oriented architecture. This contrast is one reason Industry has complained that academia ignores vital software topics, leaving students unprepared upon graduation. In too many courses, well-meaning instructors teach traditional approaches to software development that are neither supported by tools that students can readily use, nor appropriate for projects whose scope matches a college course. Students respond by continuing to build software more or less the way they always have, which is boring for students, frustrating for instructors, and disappointing for industry. This talk explains how the confluence of cloud computing, software as a service (SaaS), and Massive Open Online Courses (MOOCs) have not only revolutionized the future of software, but changed it in a way that makes it easier and more rewarding to teach. UC Berkeley's revised Software Engineering course allows students to both enhance a legacy application and to develop a new app that matches requirements of non-technical customers, using the same tools and techniques that professionals use. By experiencing the whole software lifecycle repeatedly within a single college course, students learn to use and appreciate the skills that industry has long encouraged. The course is now popular with students, rewarding for faculty, and praised by industry.
A Framework to Advise Tests using Tests
Yurong Wang, Suzette Person, Sebastian Elbaum, and Matthew B. Dwyer
University of Nebraska-Lincoln, USA; NASA Langley Research Center, USA
Tests generated by different approaches can form a rich body of information about the system under test (SUT), which can then be used to amplify the power of test suites. Diversity in test representations, however, creates an obstacle to extracting and using this information. In this work, we introduce a test advice framework which enables extraction and application of information contained in existing tests to help improve other tests or test generation techniques. Our framework aims to 1) define a simple, yet expressive test case language so that different types of tests can be represented using a unified language, and 2) define an advice extraction function that enables the elicitation and application of the information encoded in a set of test cases. Preliminary results show how test advice can be used to generate amplified test suites with higher code coverage and improved mutants killed scores over the original test suite.
A Novel Quantitative Evaluation Approach for Software Project Schedules using Statistical Model Checking
Dehui Du, Mingsong Chen, Xiao Liu, and Yun Yang
East China Normal University, China; Swinburne University of Technology, Australia
Project schedules are essential for successfully carrying out software projects. To support manager’s decision making, many project scheduling algorithms have been developed in recent years for generating candidate project schedules. However, these project schedules may not be able to be used directly because the uncertainty and complexity of real-world software development environments which have been overlooked or simplified in the project scheduling algorithms. Therefore, significant human efforts are still required to evaluate and compare these project schedules. To address such a problem, we propose a quantitative analysis approach based on statistical model checking technique which serves as a novel evaluation method for project schedules. By using the UPPAAL-SMC, we can systematically evaluate the performance of a project schedule and answer complex questions which are vital for manager’s decision making but cannot be efficiently addressed by any existing tools. The preliminary results show that our approach can efficiently filter out unsatisfactory candidates by answering simple “yes or no” questions first and then help effectively compare the rest by answering complicated user specified questions. Therefore, the human efforts in planning project schedules can be significantly reduced.
A Runtime Cloud Efficiency Software Quality Metric
Mark Shtern, Michael Smit, Bradley Simmons, and Marin Litoiu
York University, Canada; Dalhousie University, Canada
This paper introduces the Cloud Efficiency (CE) metric, a novel runtime metric which assesses how effectively an application uses software-defined infrastructure. The CE metric is computed as the ratio of two functions: i) a benefit function which captures the current set of benefits derived from the application, and ii) a cost function which describes the current charges incurred by the application's resources. We motivate the need for the CE metric, describe in further detail how to compute it, and present experimental results demonstrating its calculation.
Preprint Available
A World Full of Surprises: Bayesian Theory of Surprise to Quantify Degrees of Uncertainty
Nelly Bencomo and Amel Belaggoun
Aston University, UK; CEA, France
In the specific area of Software Engineering (SE) for self-adaptive systems (SASs) there is a growing research awareness about the synergy between SE and Artificial Intelligence (AI). However, just few significant results have been published so far. In this paper, we propose a novel and formal Bayesian definition of surprise as the basis for quantitative analysis to measure degrees of uncertainty and deviations of self-adaptive systems from normal behavior. A surprise measures how observed data affects the models or assumptions of the world during runtime. The key idea is that a "surprising'' event can be defined as one that causes a large divergence between the belief distributions prior to and posterior to the event occurring. In such a case the system may decide either to adapt accordingly or to flag that an abnormal situation is happening. In this paper, we discuss possible applications of Bayesian theory of surprise for the case of self-adaptive systems using Bayesian dynamic decision networks.
API as a Social Glue
Rohan Padhye, Debdoot Mukherjee, and Vibha Singhal Sinha
IBM Research, India
The rapid growth of social platforms such as Facebook, Twitter and LinkedIn underscores the need for people to connect to existing and new contacts for recreational and professional purposes. A parallel of this phenomenon exists in the software development arena as well. Open-source code sharing platforms such as GitHub provide the ability to follow people and projects of interest. However, users are manually required to identify projects or other users whom they might be interested in following. We observe that most software projects use third-party libraries and that developers who contribute to multiple projects often use the same library APIs across projects. Thus, the library APIs seem to be a good fingerprint of their skill set. Hence, we argue that library APIs can form the social glue to connect people and projects having similar interests. We propose APINet, a system that mines API usage profiles from source code version management systems and create a social network of people, projects and libraries. We describe our initial implementation that uses data from 568 open-source projects hosted on GitHub. Our system recommends to a user new projects and people that they may be interested in, suggests communities of people who use related libraries and finds experts for a given topic who are closest in a user's social graph.
An Automated Approach to Detect Violations with High Confidence in Incremental Code using a Learning System
Radhika D. Venkatasubramanyam and Shrinath Gupta
Siemens, India
Static analysis (SA) tools are often used to analyze a software system to identify violation of good programming practices (such as not validating arguments to public methods, use of magic numbers etc.) and potential defects (such as misused APIs, race conditions, deadlocks etc.). Most widely used SA tools perform shallow data flow analysis with the results containing considerable number of False Positives (FPs) and False Negatives (FNs). Moreover it is difficult to run these tools only on newly added or modified piece of code. In order to determine which violations are new we need to perform tedious process of post processing the SA tool results. The proposed system takes into consideration the above mentioned issues of SA and provides a lightweight approach to detection of coding violations statically and proactively, with high degree of confidence using a learning system. It also identifies the violations with a quality perspective using the predefined mapping of violations to quality attributes. We successfully implemented a prototype of the system and studied its use across some of the projects in Siemens, Corporate Technology, Development Center, Asia Australia (CT DC AA). Experimental results showed significant reduction in time required in result analysis and also in FPs and FNs reported.
Automatic Search Term Identification for Change Tasks
Katja Kevic and Thomas Fritz
University of Zurich, Switzerland
At the beginning of a change task, software developers search the source code to locate the places relevant to the task. As previous research and a small exploratory study that we conducted show, developers perform poorly in identifying good search terms and therefore waste a lot of time querying and exploring irrelevant code. To support developers in this step, we present an approach to automatically identify good search terms. Based on existing work and an analysis of change tasks, we derived heuristics, determined their relevancy and used the results to develop our approach. For a preliminary evaluation, we conducted a study with ten developers working on open source change tasks. Our approach was able to identify good search terms for all tasks and outperformed the searches of the participants, illustrating the potential of our approach. In addition, since the used heuristics are solely based on textual features of change tasks, our approach is easy and generally applicable and can leverage much of the existing work on feature location.
Brainware: Synergizing Software Systems and Neural Inputs
Shihong Huang and Emmanuelle Tognoli
Florida Atlantic University, USA
The rapid advances in the field of Brain Computer Interfaces (BCI) are expected to enrich the quality of people’s lives. BCI connects computer actions with neural inputs—signals indicating the user’s intentions, desired actions, attention, thoughts, memories, and emotions. BCI applications present significant challenges for computer science and software engineering research: an avalanche of neural signals will make their way as direct input into software systems. Given the differences between neural inputs and behavioral ones, the integration of neural inputs will require special approaches, and not simply adding yet more user interface channels to pre-existing software systems. This paper explores the challenges of designing and implementing self-adaptive software systems that could synergize brain states. After framing the problem, its rationale and possible solutions, in this paper we argue that the software engineering community ought to investigate how to incorporate neural inputs into software systems. The days are now upon us when software systems can “feel” and “anticipate” the users’ intentions and therefore react self-adaptively and synergistically to their needs.
Preprint Available
Bugarium: 3D Interaction for Supporting Large-Scale Bug Repositories Analysis
Papon Yongpisanpop, Hideaki Hata, and Kenichi Matsumoto
NAIST, Japan
Big data became problems not just how to analyze and visualize but also how to interact with the data. In software analysis and maintenance, bug tracking system receive feedbacks of the software project users everyday, which means that the data is increasing everyday. A large-scale bug tracking system that contains large amount of information does not give end users an easy way to analyze bug information because it lacks of good interaction system. We present Bugarium that integrate 3D Motion Controller and data-driven documents to ease both interaction and visualization on a large-scale bug repository. Bugarium leads to a significant increase in terms of using 3D motion controller to operate big data in software visualization. An user study shows that Bugarium made users satisfied while using it to interact and visualize a large-scale bug tracking system.
Preprint Available
Additional Information
Characterizing Defect Trends in Software Support
Tung Thanh Nguyen, Evelyn Duesterwald, Tim Klinger, P. Santhanam, and Tien N. Nguyen
Utah State University, USA; IBM Research, USA; Iowa State University, USA
We present an empirical analysis of defect arrival data in the operational phase of multiple software products. We find that the shape of the defect curves is sufficiently determined by three external and readily available release cycle attributes: the product type, the license model, and the cycle time between releases. This finding provides new insights into the driving forces affecting the specifics of defect curves and opens up new opportunities for software support organizations to reduce the cost of maintaining defect arrival models for individual products. In addition, it allows the possibility of predicting the defect arrival rate of one product from another with similar known attributes.
Cloudlet-Based Cyber-Foraging for Mobile Systems in Resource-Constrained Edge Environments
Grace A. Lewis, Sebastian Echeverría, Soumya Simanta, Ben Bradshaw, and James Root
SEI, USA
First responders and others operating in crisis environments increasingly make use of handheld devices to help with tasks such as face recognition, language translation, decision-making and mission planning. These resource-constrained edge environments are characterized by dynamic context, limited computing resources, high levels of stress, and intermittent network connectivity. Cyber-foraging is the leverage of external resource-rich surrogates to augment the capabilities of resource-limited devices. In cloudlet-based cyber-foraging, resource-intensive computation is offloaded to cloudlets – discoverable, generic servers located in single-hop proximity of mobile devices. This paper presents several strategies for cloudlet-based cyber-foraging and encourages research in this area to consider a tradeoff space beyond energy, performance and fidelity of results.
Collaborative Infrastructure for Test-Driven Scientific Model Validation
Cyrus Omar, Jonathan Aldrich, and Richard C. Gerkin
Carnegie Mellon University, USA; Arizona State University, USA
One of the pillars of the modern scientific method is model validation: comparing a scientific model's predictions against empirical observations. Today, a scientist demonstrates the validity of a model by making an argument in a paper and submitting it for peer review, a process comparable to code review in software engineering. While human review helps to ensure that contributions meet high-level goals, software engineers typically supplement it with unit testing to get a more complete picture of the status of a project. We argue that a similar test-driven methodology would be valuable to scientific communities as they seek to validate increasingly complex models against growing repositories of empirical data. Scientific communities differ from software communities in several key ways, however. In this paper, we introduce SciUnit, a framework for test-driven scientific model validation, and outline how, supported by new and existing collaborative infrastructure, it could integrate into the modern scientific process.
Compiler Error Notifications Revisited: An Interaction-First Approach for Helping Developers More Effectively Comprehend and Resolve Error Notifications
Titus Barik, Jim Witschey, Brittany Johnson, and Emerson Murphy-Hill
North Carolina State University, USA
Error notifications and their resolutions, as presented by modern IDEs, are still cryptic and confusing to developers. We propose an interaction-first approach to help developers more effectively comprehend and resolve compiler error notifications through a conceptual interaction framework. We propose novel taxonomies that can serve as controlled vocabularies for compiler notifications and their resolutions. We use preliminary taxonomies to demonstrate, through a prototype IDE, how the taxonomies make notifications and their resolutions more consistent and unified.
Preprint Available
Development Context Driven Change Awareness and Analysis Framework
Anita Sarma, Josh Branchaud, Matthew B. Dwyer, Suzette Person, and Neha Rungta
University of Nebraska-Lincoln, USA; NASA Langley Research Center, USA; NASA Ames Research Center, USA
Recent work on workspace monitoring allows conflict pre- diction early in the development process, however, these approaches mostly use syntactic differencing techniques to compare different program versions. In contrast, traditional change-impact analysis techniques analyze related versions of the program only after the code has been checked into the master repository. We propose a novel approach, DeCAF (Development Context Analysis Framework), that leverages the development context to scope a change impact analysis technique. The goal is to characterize the impact of each developer on other developers in the team. There are various client applications such as task prioritization, early conflict detection, and providing advice on testing that can benefit from such a characterization. The DeCAF frame- work leverages information from the development context to bound the iDiSE change impact analysis technique to analyze only the parts of the code base that are of interest. Bounding the analysis can enable DeCAF to efficiently com- pute the impact of changes using a combination of program dependence and symbolic execution based approaches.
Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches
Matias Martinez, Westley Weimer, and Martin Monperrus
University of Lille, France; INRIA, France; University of Virginia, USA
Much initial research on automatic program repair has focused on experimental results to probe their potential to find patches and reduce development effort. Relatively less effort has been put into understanding the hows and whys of such approaches. For example, a critical assumption of the GenProg technique is that certain bugs can be fixed by copying and re-arranging existing code. In other words, GenProg assumes that the fix ingredients already exist elsewhere in the code. In this paper, we formalize these assumptions around the concept of ``temporal redundancy''. A temporally redundant commit is only composed of what has already existed in previous commits. Our experiments show that a large proportion of commits that add existing code are temporally redundant. This validates the fundamental redundancy assumption of GenProg.
Flexible Product Line Engineering with a Virtual Platform
Michał Antkiewicz, Wenbin Ji, Thorsten Berger, Krzysztof Czarnecki, Thomas Schmorleiz, Ralf Lämmel, Ștefan Stănciulescu, Andrzej Wąsowski, and Ina Schaefer
University of Waterloo, Canada; University of Koblenz-Landau, Germany; IT University of Copenhagen, Denmark; TU Braunschweig, Germany
Cloning is widely used for creating new product variants. While it has low adoption costs, it often leads to maintenance problems. Long term reliance on cloning is discouraged in favor of systematic reuse offered by product line engineering (PLE) with a central platform integrating all reusable assets. Unfortunately, adopting an integrated platform requires a risky and costly migration. However, industrial experience shows that some benefits of an integrated platform can be achieved by properly managing a set of cloned variants. In this paper, we propose an incremental and minimally invasive PLE adoption strategy called virtual platform. Virtual platform covers a spectrum of strategies between ad-hoc clone and own and PLE with a fully-integrated platform divided into six governance levels. Transitioning to a governance level requires some effort and it provides some incremental benefits. We discuss tradeoffs among the levels and illustrate the strategy on an example implementation.
Preprint Available
Additional Information
Integrating Software Project Resources Using Source Code Identifiers
Laura Inozemtseva, Siddharth Subramanian, and Reid Holmes
University of Waterloo, Canada
Source code identifiers such as classes, methods, and fields appear in many different contexts. For instance, a developer performing a task using the android.app.Activity class could consult various project resources including the class's source file, API documentation, issue tracker, mailing list discussions, code reviews, or questions on Stack Overflow. These information sources are logically connected by the source code elements they describe, but are generally decoupled from each other. This has historically been tolerated by developers, since there was no obvious way to easily navigate between the data sources. However, it is now common for these sources to have web-based front ends that provide a standard mechanism (the browser) for viewing and interacting with the data they contain. Augmenting these front ends with hyperlinks and search would make development easier by allowing developers to quickly navigate between disparate sources of information about the same code element. In this paper, we propose a method of automatically linking disparate information repositories with an emphasis on high precision. We also propose a method of augmenting web-based front ends with these links to make it easier for developers to quickly gain a comprehensive view of the source code elements they are investigating. Research challenges include identifying source code tokens in the midst of natural language text and incomplete code fragments, dynamically augmenting the web views of the data repositories, and supporting novel composition of the link data to provide comprehensive views for specific source code elements.
Preprint Available
Additional Information
Lab-Based Action Design Research
Paul Ralph
Lancaster University, UK
This paper proposes a research methodology, Lab-based Action Design Research, which combines organizational intervention (action research), building innovative artifacts (engineering research) and studies of software development practice (behavioral research) within a laboratory environment. Seven principles for successful Lab-based Action Design Research are proposed – attract funding with a win-win scenario; select inspiring projects; conduct simultaneous studies; mix methods; use longitudinal, quasi-experimental designs; use enterprise-level technical infrastructure; use established project management infrastructure. Initial evaluation indicates that the proposed approach is practical and may produce improvements in internal validity and theoretical generalizability.
Preprint Available
Leveraging P2P Networks to Address the Test Scenario Explosion Problem
Mark Micallef, Conrad Attard, Andrea Mangion, and Sebastian Attard
University of Malta, Malta
The behaviour of software is influenced by whatever environment it happens to be deployed in. Achieving a sufficient level of coverage for all deployment scenarios during lab testing is difficult for even the most resources rich organisation. We refer to this as the Test Scenario Explosion Problem and propose the construction of a peer-to-peer network which facilitates the quick creation of large-scale virtual test labs that are representative of a company's customer base. Following an outline of our initial ideas in this regard, a number of open research challenges are discussed.
Metamorphic Fault Tolerance: An Automated and Systematic Methodology for Fault Tolerance in the Absence of Test Oracle
Huai Liu, Iman I. Yusuf, Heinz W. Schmidt, and Tsong Yueh Chen
RMIT University, Australia; Swinburne University of Technology, Australia
A system may fail due to an internal bug or a fault in its execution environment. Incorporating fault tolerance strategies enables such system to complete its function despite the failure of some of its parts. Prior to the execution of some fault tolerance strategies, failure detection is needed. Detecting incorrect output, for instance, assumes the existence of an oracle to check the correctness of program outputs given an input. However, in many practical situations, oracle does not exist or is extremely difficult to apply. Such an oracle problem is a major challenge in the context of software testing. In this paper, we propose to apply metamorphic testing, a software testing method that alleviates the oracle problem, into fault tolerance. The proposed technique supports failure detection without the need of oracles.
Mining Precise Performance-Aware Behavioral Models from Existing Instrumentation
Tony Ohmann, Kevin Thai, Ivan Beschastnikh, and Yuriy Brun
University of Massachusetts, USA; Facebook, USA; University of British Columbia, Canada
Software bugs often arise from differences between what developers envision their system does and what that system actually does. When faced with such conceptual inconsistencies, debugging can be very difficult. Inferring and presenting developers with accurate behavioral models of the system implementation can help developers reconcile their view of the system with reality and improve system quality. We present Perfume, a model-inference algorithm that improves on the state of the art by using performance information to differentiate otherwise similar-appearing executions and to remove false positives from the inferred models. Perfume uses a system's runtime execution logs to infer a concise, precise, and predictive finite state machine model that describes both observed executions and executions that have not been observed but that the system can likely generate. Perfume guides the model inference process by mining temporal performance-constrained properties from the logs, ensuring precision of the model's predictions. We describe the model inference process and demonstrate how it improves precision over the state of the art.
Preprint Available
Additional Information
Modeling Self-Adaptive Software Systems with Learning Petri Nets
Zuohua Ding, Yuan Zhou, and MengChu Zhou
Zhejiang Sci-Tech University, China; New Jersey Institute of Technology, USA
Traditional models have limitation to model adaptive software systems since they build only for fixed requirements, and cannot model the behaviors that change at run-time in response to environmental changes. In this paper, an adaptive Petri net is proposed to model a self-adaptive software system. It is an extension of hybrid Petri nets by embedding a neural network algorithm into them at some special transitions. The proposed net has the following advantages: 1) It can model a runtime environment; 2) The components in the model can collaborate to make adaption decisions; and 3) The computing is done at the local, while the adaption is for the whole system. We illustrate the proposed adaptive Petri net by modeling a manufacturing system.
New Opportunities for Extracting Insights from Cloud Based IDEs
Yi Wang, Patrick Wagstrom, Evelyn Duesterwald, and David Redmiles
University of California at Irvine, USA; IBM Research, USA
Traditional integrated development environments (IDEs) provide developers with robust environments for writing, testing, debugging, and deploying code. As the world becomes increasingly networked and more services are delivered via the cloud, it is only natural that the functionality of IDEs be delivered via the cloud. In addition to simplifying the provisioning and deployment of new IDE features, and making it easier to integrate with other web native tools, cloud based IDEs provide some fundamental advantages when it comes to understanding the behavior of a wide community of software developers. One of these advantages for the IDE provider is the ability to transparently monitor and analyze the real-time fine-grained actions of a large number of developers. In this paper, we explore how to leverage these transparent monitoring capabilities of cloud based IDEs to develop advanced analytics to understand developers' behavior and infer their characteristics. We demonstrate the feasibility of this research direction with a preliminary study focusing on the way that source code files grow for different developers, development tasks, and skill levels. We then analyze the trends of source code file growth and find growth is more similar within subjects than within tasks.
Preprint Available
On Failure Classification: The Impact of "Getting It Wrong"
Davide Falessi, Bill Kidwell, Jane Huffman Hayes, and Forrest Shull
Fraunhofer CESE, USA; University of Kentucky, USA; SEI, USA
Bug classification is a well-established practice which supports important activities such as enhancing verification and validation (V&V) efficiency and effectiveness. The state of the practice is manual and hence classification errors occur. This paper investigates the sensitivity of the value of bug classification (specifically, failure type classification) to its error rate; i.e., the degree to which misclassified historic bugs decrease the V&V effectiveness (i.e., the ability to find bugs of a failure type of interest). Results from the analysis of an industrial database of more than 3,000 bugs show that the impact of classification error rate on V&V effectiveness significantly varies with failure type. Specifically, there are failure types for which a 5% classification error can decrease the ability to find them by 66%. Conversely, there are failure types for which the V&V effectiveness is robust to very high error rates. These results show the utility of future research aimed at: 1) providing better tool support for decreasing human errors in classifying the failure type of bugs, 2) providing more robust approaches for the selection of V&V techniques, and 3) including robustness as an important criterion when evaluating technologies.
Quantifying Programmers' Mental Workload during Program Comprehension Based on Cerebral Blood Flow Measurement: A Controlled Experiment
Takao Nakagawa, Yasutaka Kamei, Hidetake Uwano, Akito Monden, Kenichi Matsumoto, and Daniel M. German
NAIST, Japan; Kyushu University, Japan; Nara National College of Technology, Japan; University of Victoria, Canada
Program comprehension is a fundamental activity in software development that cannot be easily measured, as it is performed inside the human brain. Using a wearable Near Infra-red Spectroscopy (NIRS) device to measure cerebral blood flow, this paper tries to answer the question: Can the measurement of brain blood-flow quantify programmers' mental workload during program comprehension activities? We performed a controlled experiment with 10 subjects; 8 of them showed high cerebral blood flow while understanding strongly obfuscated programs (requiring high mental workload). This suggests the possibility of using NIRS to measure the mental workload of a person during software development activities.
RegViz: Visual Debugging of Regular Expressions
Fabian Beck, Stefan Gulan, Benjamin Biegel, Sebastian Baltes, and Daniel Weiskopf
University of Stuttgart, Germany; University of Trier, Germany
Regular expressions are a widely used programming technique, but seem to be neglected by software engineering research. Encoding complex string parsing in a very compact notation, their complexity and compactness, however, introduce particular challenges with respect to program comprehension. In this paper, we present RegViz, an approach to visually augment regular expressions without changing their original textual notation. The visual encoding clarifies the structure of the regular expressions and clearly discerns included tokens by function. The approach also provides advanced visual highlighting of matches in a sample text and defining test cases therein. We implemented RegViz as a Web-based tool for JavaScript regular expressions. Expert feedback suggests that the approach is intuitive to apply and increases the readability of regular expressions.
Preprint Available
Additional Information
Reproducing Software Failures by Exploiting the Action History of Undo Features
Tobias Roehm and Bernd Bruegge
TU München, Germany
Bug reports seldom contain information about the steps to reproduce a failure. Therefore, failure reproduction is a time consuming, difficult, and sometimes impossible task for software developers. Users are either unaware of the importance of steps to reproduce, are unable to describe them, or do not have time to report them. Similarly, automated crash reporting tools usually do not capture this information. In order to tackle this problem, we propose to exploit the action history of undo features, i.e. the history of user actions captured by many applications in order to allow users to undo previous actions. As it is captured anyway, our approach does not introduce additional monitoring overhead. We propose to extract the action history upon occurrence of a failure and present it to developers during bug fixing. Our hypothesis is that information about user actions contained in the action history of undo features enables developers to reproduce failures. We support this hypothesis with anecdotal evidence from a small empirical study of bug reports. A thorough evaluation is necessary to investigate the applicability and impact of our approach and to compare it to existing capture/ replay approaches.
Reusable Execution Replay: Execution Record and Replay for Source Code Reuse
Ameer Armaly, Casey Ferris, and Collin McMillan
University of Notre Dame, USA
A key problem during source code reuse is that, to reuse even a small section of code from a program, a programmer must include a huge amount of dependency source code from elsewhere in the same program. These dependencies are no- toriously large and complex, and many can only be known at runtime. In this paper, we propose execution record/replay as a solution to this problem. We describe a novel reuse technique that allows programmers to reuse functions from a C or C++ program, by recording the execution of the program and selectively modifying how its functions are re- played. We have implemented our technique and evaluated it in a preliminary study in which two programmers used our tool to complete four tasks over four hours.
Shadow Symbolic Execution for Better Testing of Evolving Software
Cristian Cadar and Hristina Palikareva
Imperial College London, UK
In this idea paper, we propose a novel way for improving the testing of program changes via symbolic execution. At a high-level, our technique runs two different program versions in the same symbolic execution instance, with the old version effectively shadowing the new one. In this way, the technique can exploit precise dynamic value information to effectively drive execution toward the behaviour that has changed from one version to the next. We discuss the main challenges and opportunities of this approach in terms of pruning and prioritising path exploration, mapping elements across versions, and sharing common symbolic state between versions.
Preprint Available
Software Bug Localization with Markov Logic
Sai Zhang and Congle Zhang
University of Washington, USA
Software bug localization is the problem of determining buggy statements in a software system. It is a crucial and expensive step in the software debugging process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, existing approaches tend to use isolated information to address the problem, and are often ad hoc. In particular, most existing approaches predict the likelihood of a statement being buggy sequentially and separately. This paper proposes a well-founded, integrated solution to the software bug localization problem based on Markov logic. Markov logic combines first-order logic and probabilistic graphical models by attaching weights to first-order formulas, and views them as templates for features of Markov networks. We show how a number of salient program features can be seamlessly combined in Markov logic, and how the resulting joint inference can be solved. We implemented our approach in a debugging system, called MLNDebugger, and evaluated it on 4 small programs. Our initial results demonstrated that our approach achieved higher accuracy than a previous approach.
Preprint Available
Software Engineering for 'Social Good': Integrating Action Research, Participatory Design, and Agile Development
Maria Angela Ferrario, Will Simm, Peter Newman, Stephen Forshaw, and Jon Whittle
Lancaster University, UK
Software engineering for ‘social good’ is an area receiving growing interest in recent years. Software is increasingly seen as a way to promote positive social change: this includes initiatives such as Code for America and events such as hackathons, which strive to build innovative software solutions with a social conscience. From a software engineering perspective, existing software processes do not always match the needs of these social software projects, which are primarily aimed at social change and often involve vulnerable communities. In this paper, we argue for new software processes that combine elements of agile, iterative development with principles drawn from action research and participatory design. The former allow social software projects to be built quickly with limited resources; the latter allow for a proper understanding of the social context and vulnerable user groups. The paper describes Speedplay, a software development management framework integrating these approaches, and illustrates its use in a real social innovation case study.
Steering Model-Based Oracles to Admit Real Program Behaviors
Gregory Gay, Sanjai Rayadurgam, and Mats P. E. Heimdahl
University of Minnesota, USA
The oracle—an arbiter of correctness of the system under test (SUT)—is a major component of the testing process. Specifying oracles is particularly challenging for real-time embedded systems, where small changes in time or sensor inputs may cause large differences in behavior. Behavioral models of such systems, often built for analysis and simulation purposes, are naturally appealing for reuse as oracles. However, these models typically provide an idealized view of the system. Even when given the same inputs, the model’s behavior can frequently be at variance with some acceptable behavior of the SUT executing on a real platform. We therefore propose steering the model when used as an oracle, to admit an expanded set of behaviors when judging the SUT’s adherence to its requirements. On detecting a behavioral difference, the model is backtracked and then searched for a new state that satisfies certain constraints and minimizes a dissimilarity metric. The goal is to allow non-deterministic, but bounded, behavior differences while preventing future mismatches, by guiding the oracle—within limits—to match the execution of the SUT. Early experimental results show that steering significantly increases SUT-oracle conformance with minimal masking of real faults and, thus, has significant potential for reducing development costs.
Preprint Available
Who Asked What: Integrating Crowdsourced FAQs into API Documentation
Cong Chen and Kang Zhang
University of Texas at Dallas, USA
Documentation is important for learning Application Programming Interfaces (APIs). In addition to official documents, much crowdsourced API knowledge is available on the Web. Crowdsourced API documentation is fragmented, scattered around the Web, and disconnected from official documentation. Developers often rely on Web search to retrieve additional programming help. We propose to connect these two types of documentation by capturing developers' Web browsing behavior in the context of document reading and integrating crowdsourced frequently asked questions (FAQs) into API documents. Such an integration not only provides relevant API help more conveniently, but also opens a new approach to promoting knowledge collaboration and studying API users' information needs.
Preprint Available
Who is the Expert? Combining Intention and Knowledge of Online Discussants in Collaborative RE Tasks
Itzel Morales-Ramirez, Matthieu Vergne, Mirko Morandini, Alberto Siena, Anna Perini, and Angelo Susi
Fondazione Bruno Kessler, Italy; University of Trento, Italy
Large, distributed software development projects rely on the collaboration of culturally heterogeneous and geographically distributed stakeholders. Software requirements, as well as solution ideas are elicited in distributed processes, which increasingly use online forums and mailing lists, in which stakeholders mainly use free or semi-structured natural language text. The identification of contributors of key information about a given topic --called experts, in both the software domain and code-- and in particular an automated support for retrieving information from available online resources, are becoming of crucial importance. In this paper, we address the problem of expert finding in mailing-list discussions, and propose an approach which combines content- and intent-based information extraction for ranking online discussants with respect to their expertise in the discussed topics. We illustrate its application on an example.
Writing Bidirectional Model Transformations as Intentional Updates
Tao Zan, Hugo Pacheco, and Zhenjiang Hu
Graduate University for Advanced Studies, Japan; National Institute of Informatics, Japan
Model synchronization plays an important role in model- driven software development. Bidirectional model transformation approaches provide techniques for developers to specify the bidirectional relationship between source and target models, while keeping related models synchronized for free. Since models of interest are usually not in a one-to-one correspondence, this synchronization process is inherently ambiguous. Nevertheless, existing bidirectional model trans- formation tools focus mainly on enforcing consistency and provide developers only limited control over how models are synchronized, solving the latent ambiguity via default strategies whose behavior is unclear to developers. In this paper, we propose a novel approach in which developers write update programs that succinctly describe how a target model can be used to update a source model, such that the bidirectional behavior is fully determined. The new approach mitigates the unpredictability of existing solutions, by enabling a finer and more transparent control of what a bidirectional transformation does, and suggests a research direction for building more robust bidirectional model transformation tools.
Atlas: A New Way to Explore Software, Build Analysis Tools
Tom Deering, Suresh Kothari, Jeremias Sauceda, and Jon Mathews
Iowa State University, USA; EnSoft, USA
Atlas is a new software analysis platform from EnSoft Corp. Atlas decouples the domain-specific analysis goal from its underlying mechanism by splitting analysis into two distinct phases. In the first phase, polynomial-time static analyzers index the software AST, building a rich graph database. In the second phase, users can explore the graph directly or run custom analysis scripts written using a convenient API. These features make Atlas ideal for both interaction and automation. In this paper, we describe the motivation, design, and use of Atlas. We present validation case studies, including the verification of safe synchronization of the Linux kernel, and the detection of malware in Android applications. Our ICSE 2014 demo explores the comprehension and malware detection use cases. Video: http://youtu.be/cZOWlJ-IO0k
Additional Information
BOAT: An Experimental Platform for Researchers to Comparatively and Reproducibly Evaluate Bug Localization Techniques
Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, and Bo Zhou
Zhejiang University, China; Singapore Management University, Singapore
Bug localization refers to the process of identifying source code files that contain defects from descriptions of these defects which are typically contained in bug reports. There have been many bug localization techniques proposed in the literature. However, often it is hard to compare these techniques since different evaluation datasets are used. At times the datasets are not made publicly available and thus it is difficult to reproduce reported results. Furthermore, some techniques are only evaluated on small datasets and thus it is not clear whether the results are generalizable. Thus, there is a need for a platform that allows various techniques to be compared with one another on a common pool containing a large number of bug reports with known defective source code files. In this paper, we address this need by proposing our Bug lOcalization experimental plATform (BOAT). BOAT is an extensible web application that contains thousands of bug reports with known defective source code files. Researchers can create accounts in BOAT, upload executables of their bug localization techniques, and see how these techniques perform in comparison with techniques uploaded by other researchers, with respect to some standard evaluation measures. BOAT is already preloaded with several bug localization techniques and thus researchers can directly compare their newly proposed techniques against these existing techniques. BOAT has been made available online since October 2013, and researchers could access the platform at: http://www.vlis.zju.edu.cn/blp.
Cookbook: In Situ Code Completion using Edit Recipes Learned from Examples
John Jacobellis, Na Meng, and Miryung Kim
University of Texas at Austin, USA
Existing code completion engines leverage only pre-defined templates or match a set of user-defined APIs to complete the rest of changes. We propose a new code completion technique, called Cookbook, where developers can define custom edit recipes—a reusable template of complex edit operations—by specifying change examples. It generates an abstract edit recipe that describes the most specific generalization of the demonstrated example program transformations. Given a library of edit recipes, it matches a developer’s edit stream to recommend a suitable recipe that is capable of filling out the rest of change customized to the target. We evaluate Cookbook using 68 systematic changed methods drawn from the version history of Eclipse SWT. Cookbook is able to narrow down to the most suitable recipe in 75% of the cases. It takes 120 milliseconds to find the correct suitable recipe on average, and the edits produced by the selected recipe are on average 82% similar to developer’s hand edit. This shows Cookbook’s potential to speed up manual editing and to minimize developer’s errors. Our demo video is available at https://www.youtube.com/watch?v=y4BNc8FT4RU.
DASHboards: Enhancing Developer Situational Awareness
Oleksii Kononenko, Olga Baysal, Reid Holmes, and Michael W. Godfrey
University of Waterloo, Canada
Issue trackers monitor the progress of software development "issues", such as bug fixes and discussions about features. Typically, developers subscribe to issues they are interested in through the tracker, and are informed of changes and new developments via automated email. In practice, however, this approach does not scale well, as developers may receive large volumes of messages that they must sort through using their mail client; over time, it becomes increasingly challenging for them to maintain awareness of the issues that are relevant to their activities and tasks. To address this problem, we present a tool called DASH that is implemented in the form of personalized views of issues; developers indicate issues of interest and DASH presents customized views of their progress and informs them of changes as they occur. Video: http://youtu.be/Jka_MsZet20
Preprint Available
ImpactMiner: A Tool for Change Impact Analysis
Bogdan Dit, Michael Wagner, Shasha Wen, Weilin Wang, Mario Linares-Vásquez, Denys Poshyvanyk, and Huzefa Kagdi
College of William and Mary, USA; Wichita State University, USA
Developers are often faced with a natural language change request (such as a bug report) and tasked with identifying all code elements that must be modified in order to fulfill the request (e.g., fix a bug or implement a new feature). In order to accomplish this task, developers frequently and routinely perform change impact analysis. This formal demonstration paper presents ImpactMiner, a tool that implements an integrated approach to software change impact analysis. The proposed approach estimates an impact set using an adaptive combination of static textual analysis, dynamic execution tracing, and mining software repositories techniques. ImpactMiner is available from our online appendix http://www.cs.wm.edu/semeru/ImpactMiner/
Additional Information
LTSA-PCA: Tool Support for Compositional Reliability Analysis
Pedro Rodrigues, Emil Lupu, and Jeff Kramer
Imperial College London, UK
Software systems are often constructed by combining new and existing services and components. Models of such systems should therefore be compositional in order to reflect the architectural structure. We present herein an extension of the LTSA model checker. It supports the specification, visualisation and failure analysis of composable, probabilistic behaviour of component-based systems, modelled as Probabilistic Component Automata (PCA). To evaluate aspects such as the probability of system failure, a DTMC model can be automatically constructed from the composition of the PCA representations of each component and analysed in tools such as PRISM. Before composition, we reduce each PCA to its interface behaviour in order to mitigate state explosion associated with composite representations. Moreover, existing behavioural analysis techniques in LTSA can be applied to PCA representations to verify the compatibility of interface behaviour between components with matching provided-required interfaces. A video highlighting the main features of the tool can be found at: http://youtu.be/moIkx8JHE7o.
Preprint Available
Migrating Code with Statistical Machine Translation
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen
Iowa State University, USA; Utah State University, USA
In the era of mobile computing, developers often need to migrate code written for one platform in a programming language to another language for a different platform, e.g., from Java for Android to C# for Windows Phone. The migration process is often performed manually or semi-automatically, in which developers are required to manually define translation rules and API mappings. This paper presents semSMT, an automatic tool to migrate code written in Java to C#. semSMT utilizes statistical machine translation to automatically infer translation rules from existing migrated code, thus, requires no manual defining of rules. The video demonstration on semSMT can be found on YouTube at http://www.youtube.com/watch?v=aRSnl5-7vNo.
Product Assignment Recommender
Jialiang Xie, Qimu Zheng, Minghui Zhou, and Audris Mockus
Peking University, China; Avaya Labs Research, USA
Effectiveness of software development process depends on the accuracy of data in supporting tools. In particular, a customer issue assigned to a wrong product team takes much longer to resolve (negatively affecting user-perceived quality) and wastes developer effort. In Open Source Software (OSS) and in commercial projects values in issue-tracking systems (ITS) or Customer Relationship Management (CRM) systems are often assigned by non-developers for whom the assignment task is difficult. We propose PAR (Product Assignment Recommender) to estimate the odds that a value in the ITS is incorrect. PAR learns from the past activities in ITS and performs prediction using a logistic regression model. Our demonstrations show how PAR helps developers to focus on fixing real problems, and how it can be used to improve data accuracy in ITS by crowd-sourcing non-developers to verify and correct low-accuracy data. http://youtu.be/IuykbzSTj8s
SEWordSim: Software-Specific Word Similarity Database
Yuan Tian, David Lo, and Julia Lawall
Singapore Management University, Singapore; INRIA, France; LIP6, France
Measuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in a software-engineering context, and the software-specific word similarity resources that have been developed rely on data sources containing only a limited range of words and word uses. In recent work, we have proposed a word similarity resource based on information collected automatically from StackOverflow. We have found that the results of this resource are given scores on a 3-point Likert scale that are over 50% higher than the results of a resource based on WordNet. In this demo paper, we review our data collection methodology and propose a Java API to make the resulting word similarity resource useful in practice. The SEWordSim database and related information can be found at http://goo.gl/BVEAs8. Demo video is available at http://goo.gl/dyNwyb.
Teamscale: Software Quality Control in Real-Time
Lars Heinemann, Benjamin Hummel, and Daniela Steidl
CQSE, Germany
When large software systems evolve, the quality of source code is essential for successful maintenance. Controlling code quality continuously requires adequate tool support. Current quality analysis tools operate in batch-mode and run up to several hours for large systems, which hampers the integration of quality control into daily development. In this paper, we present the incremental quality analysis tool Teamscale, providing feedback to developers within seconds after a commit and thus enabling real-time software quality control. We evaluated the tool within a development team of a German insurance company. A video demonstrates our tool: http://www.youtube.com/watch?v=nnuqplu75Cg.
VMVM: Unit Test Virtualization for Java
Jonathan Bell and Gail Kaiser
Columbia University, USA
As software evolves and grows, its regression test suites tend to grow as well. When these test suites become too large, they can eventually reach a point where they become too length to regularly execute. Previous work in Test Suite Minimization has reduced the number of tests in such suites by attempting to identify those that are redundant (e.g. by a coverage metric). Our approach to ameliorating the runtime of these large test suites is complementary, instead focusing on reducing the overhead of running each test, an approach that we call Unit Test Virtualization. This Tool Demonstration presents our implementation of Unit Test Virtualization, VMVM (pronounced "vroom-vroom") and summarizes an evaluation of our implementation on 20 real-world Java applications, showing that it reduces test suite execution time by up to 97% (on average, 62%). A companion video to this demonstration is available online, at https://www.youtube.com/watch?v=sRpqF3rJERI.
Preprint Available
Additional Information
VeriWS: A Tool for Verification of Combined Functional and Non-functional Requirements of Web Service Composition
Manman Chen, Tian Huat Tan, Jun Sun, Yang Liu, and Jin Song Dong
National University of Singapore, Singapore; Singapore University of Technology and Design, Singapore; Nanyang Technological University, Singapore
Web service composition is an emerging technique to develop Web applications by composing existing Web services. Web service composition is subject to two important classes of requirements, i.e., functional and non-functional requirements. Both are crucial to Web service composition. Therefore, it is desirable to verify combined functional and non-functional requirements for Web service composition. We present VeriWS, a tool to verify combined functional and non-functional requirements of Web service composition. VeriWS captures the semantics of Web service composition and verifies it directly based on the semantics. We also show how to describe Web service composition and properties using VeriWS. The YouTube video for demonstration of VeriWS is available at https://sites. google.com/site/veriwstool/.
Verily: A Web Framework for Creating More Reasonable Web Applications
John L. Singleton and Gary T. Leavens
University of Central Florida, USA
The complexity of web application construction is increasing at an astounding rate. Developing for the web typically crosses multiple application tiers in a variety of languages, which can result in disjoint code bases. This lack of standardization introduces new challenges for reasoning. In this paper we introduce Verily, a new web framework for Java that supports the development of verifiable web applications. Rather than requiring that programs be verified in separate a posteriori analysis, Verily supports construction via a series of Recipes, which are properties of an application that are enforced at compile time. In addition to introducing the Verily framework, we also present two Recipes: the Core Recipe, an application architecture for web applications designed to replace traditional server-side Model View Controller, and the Global Mutable State Recipe, which enables developers to use sessions within their applications without resorting to the use of unrestricted global mutable state. Demo Video: http://www.youtube.com/watch?v=TjRF7E4um3c
ViVA: A Visualization and Analysis Tool for Distributed Event-Based Systems
Youn Kyu Lee, Jae young Bang, Joshua Garcia, and Nenad Medvidovic
University of Southern California, USA
Distributed event-based (DEB) systems are characterized by highly-decoupled components that communicate by exchanging messages. This form of communication enables flexible and scalable system composition but also reduces understandability and maintainability due to the indirect manner in which DEB components communicate. To tackle this problem, we present Visualizer for eVent-based Architectures, ViVA, a tool that effectively visualizes the large number of messages and dependencies that can be exchanged between components and the order in which the exchange of messages occur. In this paper, we describe the design, implementation, and key features of ViVA. (Demo video at http://youtu.be/jHVwuR5AYgA)
Preprint Available
Automatic Generation of Cost-Effective Test Oracles
Alberto Goffi
University of Lugano, Switzerland
Software testing is the primary activity to guarantee some level of quality of software systems. In software testing, the role of test oracles is crucial: The quality of test oracles directly affects the effectiveness of the testing activity and influences the final quality of software systems. So far, research in software testing focused mostly on automating the generation of test inputs and the execution of test suites, paying less attention to the generation of test oracles. Available techniques for generating test oracle are either effective but expensive or inexpensive but ineffective. Our research work focuses on the generation of cost-effective test oracles. Recent research work has shown that modern software systems can provide the same functionality through different execution sequences. In other words, multiple execution sequences perform the same, or almost the same, action. This phenomenon is called intrinsic redundancy of software systems. We aim to design and develop a completely automated technique to generate test oracles by exploiting the intrinsic redundancy freely available in the software. Test oracles generated by our technique check the equivalence between a given execution sequence and all the redundant and supposedly equivalent execution sequences that are available. The results obtained so far are promising.
Preprint Available
COASTmed: Software Architectures for Delivering Customizable, Policy-Based Differential Web Services
Alegria Baquero
University of California at Irvine, USA
Inter-organizational exchange of personal information raises significant challenges in domains such as healthcare. First, trust among parties is not homogenous; data is shared according to complex relations. Second, personal data is used for unexpected, often divergent purposes. This tension between information need and provision calls for custom services whose access depends on specific trust and legal ties. Current Web services are "one-size-fits-all" solutions that do not capture nuanced relations nor meet all users' needs. Our goal is providing computation-enabled services which: (a) are accessible based on providers' policies, and; (b) allow user-controlled customization within the authority granted. We present our proposed solutions in COASTmed, a prototype for electronic health record (EHR) management which leverages novel architectural principles and formal policies.
Cross-Platform Testing and Maintenance of Web and Mobile Applications
Shauvik Roy Choudhary
Georgia Tech, USA
Modern software applications are expected to run on a variety of web and mobile platforms with diverse software and hardware level features. Thus, developers of such software need to duplicate the testing and maintenance effort on a wide range of platforms. Often developers are not able to cope with this increasing demand. Thus, they release software that is broken on certain platforms affecting a class of customers using such platforms. The goal of my work is to improve the testing and maintenance of cross-platform applications by developing automated techniques for matching such applications across the different platforms.
Dynamic Data-Flow Testing
Mattia Vivanti
University of Lugano, Switzerland
Data-flow testing techniques have long been discussed in the literature, yet to date they are still of little practical relevance. The applicability of data-flow testing is limited by the complexity and the imprecision of the approach: writing a test suite that satisfy a data-flow criterion is challenging due to the presence of many test objectives that include infeasible elements in the coverage domain and exclude feasible ones that depend on aliasing and dynamic constructs. To improve the applicability and effectiveness of data-flow testing we need both to augment the precision of the coverage domain by including data-flow elements dependent on aliasing and to exclude infeasible ones that reduce the total coverage. In my PhD research I plan to address these two problems by designing a new data-flow testing approach that combines automatic test generation and dynamic identification of data-flow elements that can identify precise test targets by monitoring the program executions.
Enhancing Feature Interfaces for Supporting Software Product Line Maintenance
Bruno B. P. Cafeo
PUC-Rio, Brazil
Software product line (SPL) is a technology aimed at speeding up the development process. Although SPLs are widely used, their maintenance is a challenging task. In particular, when maintaining a SPL feature, developers need to know which parts of other dependent features might be affected by this maintenance. Otherwise, further maintenance problems can be introduced in the SPL implementation. However, the identification and understanding of the so-called feature dependencies in the source code are an exhaustive and error-prone task. In fact, developers often ignore unconsciously feature dependencies while reasoning about SPL maintenance. To overcome this problem, this PhD research aims at understanding the properties of feature dependencies in the source code that exert impact on SPL maintenance. Furthermore, we propose a way to structure and segregate feature interfaces in order to help developers to identify and understand feature dependencies, thus reducing the effort and avoiding undesirable side effects in SPL maintenance.
Preprint Available
Formal Verification Problems in a Big Data World: Towards a Mighty Synergy
Matteo Camilli
University of Milan, Italy
Formal verification requires high performance data processing software for extracting knowledge from the unprecedented amount of data coming from analyzed systems. Since cloud based computing resources have became easily accessible, there is an opportunity for verification techniques and tools to undergo a deep technological transition to exploit the new available architectures. This has created an increasing interest in parallelizing and distributing verification techniques. In this paper we introduce a distributed approach which exploits techniques typically used by the bigdata community to enable verification of very complex systems using bigdata approaches and cloud computing facilities.
Holistic Recommender Systems for Software Engineering
Luca Ponzanelli
University of Lugano, Switzerland
Software maintenance is a relevant and expensive phase of the software development process. Developers have to deal with legacy and undocumented code that hinders the comprehension of the software system at hand. Enhancing program comprehension by means of recommender systems in the Integrated Development Environment (IDE) is a solution to assist developers in these tasks. The recommender systems proposed so far generally share common weaknesses: they are not proactive, they consider a single type of data-source, and in case of multiple data-source, relevant items are suggested together without considering interactions among them. We envision a future where recommender systems follow a holistic approach: They provide knowledge regarding a programming context by considering information beyond the one provided by single elements in the context of the software development. The recommender system should consider different elements such as development artifact (e.g., bug reports, mailing lists), and online resources (e.g., blogs, Q&A web sites, API documentation), developers activities, repository history etc. The provided information should be novel and emerge from the semantic links created by the analysis of the interactions among these elements.
Human Aspects, Gamification, and Social Media in Collaborative Software Engineering
Bogdan Vasilescu
Eindhoven University of Technology, Netherlands
Software engineering is inherently a collaborative venture. In open-source software (OSS) development, such collaborations almost always span geographies and cultures. Because of the decentralised and self-directed nature of OSS as well as the social diversity inherent to OSS communities, the success of an OSS project depends to a large extent on the social aspects of distributed collaboration and achieving coordination over distance. The goal of this dissertation research is to raise our understanding of how human aspects (e.g., gender or cultural diversity), gamification and social media (e.g., participation in social environments such as Stack Overflow or GitHub) impact distributed collaboration in OSS.
Preprint Available
Improving Enterprise Software Maintenance Efficiency through Mining Software Repositories in an Industry Context
Senthil Mani
IIIT Delhi, India
There is an increasing trend to outsource maintenance of large applications and application portfolios of a business to third parties, specializing in application maintenance, who are incented to deliver the best possible maintenance at the lowest cost. In a typical industry setting any maintenance project spans three different phases; Transition, Steady-State and Preventive Maintenance. Each phase has different goals and drivers, but underlying software repositories or artifacts remain the same. To improve the overall efficiency of the process and people involved in these different phases, we require appropriate insights to be derived from the available software repositories. In the past decade considerable research has been done in mining software repositories and deriving insights, particularly focussed on open source softwares. However, focussed studies on enterprise software maintenance in an industrial setting is severely lacking. In this thesis work, we intend to understand the industry needs on desired insights and limitations on available software artifacts across these different phases. Based on this understanding we intend to propose and develop novel methods and approaches for deriving desirable insights from software repositories. We also intend to leverage empirical techniques to validate our approaches both qualitatively and quantitatively.
Improving Exception Handling with Recommendations
Eiji Adachi Barbosa
PUC-Rio, Brazil
Exception handling mechanisms are the most common model used to design and implement robust software systems. Despite their wide adoption in mainstream programming languages, empirical evidence suggests that developers are still not properly using these mechanisms to achieve better software robustness. Without adequate support, developers struggle to decide the proper manner in which they should handle their exceptions, i.e., the place where the exception should be caught and the handling actions that should be implemented. As a consequence, they tend to ignore exceptions by implementing empty handlers or leaving them unhandled, which may ultimately lead to the introduction of faults in the source code. In this context, this PhD research aims at investigating means to improve the quality of exception handling in software projects. To achieve this goal, we propose a recommender system able to support developers in implementing exception handling.
Preprint Available
Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities
Monika Gupta
IIIT Delhi, India
Process mining is to extract knowledge about business processes from data stored implicitly in ad-hoc way or explicitly by information systems. The aim is to discover runtime process, analyze performance and perform conformance verification, using process mining tools like ProM and Disco, for single software repository and processes spanning across multiple repositories. Application of process mining to software repositories has recently gained interest due to availability of vast data generated during software development and maintenance. Process data are embodied in repositories which can be used for analysis to improve the efficiency and capability of process, however, involves a lot of challenges which have not been addressed so far. Project team defines workflow, design process and policies for tasks like issue tracking (defect or feature enhancement), peer code review (review the submitted patch to avoid defects before they are injected) etc. to streamline and structure the activities. The reality may not be the same as defined because of imperfections so the extent of non-conformance needs to be measured. We propose a research framework `Nirikshan' to process mine the data of software repositories from multiple perspectives like process, organizational, data and time. We apply process mining on software repositories to derive runtime process map, identify and remove inefficiencies and imperfections, extend the capabilities of existing software engineering tools to make them more process aware, and understand interaction pattern between various contributors to improve the efficiency of project.
On the Use of Visualization for Supporting Software Reuse
Marcelo Schots
COPPE, Brazil; Federal University of Rio de Janeiro, Brazil
Reuse is present in the daily routine of software developers, yet mostly in an ad-hoc or pragmatic way. Reuse practices allow for reducing the time and effort spent on software development. However, organizations struggle in beginning and coping with a reuse program. The APPRAiSER environment, proposed in this work, aims at providing reuse awareness according to each stakeholder’s needs for performing reuse-related tasks, by providing appropriate software visualization mechanisms. The long-term goal is to help introducing, instigating, establishing and monitoring software reuse initiatives, by decreasing the effort and time spent by stakeholders in performing reuse tasks.
Performance Analysis of Object-Oriented Software
David Maplesden
University of Auckland, New Zealand
Many large scale object-oriented applications suffer from chronic performance problems. The many layers of generalised frameworks and libraries that these applications are typically assembled from leads to seemingly simple tasks requiring many thousands of operations to complete. Addressing these performance problems with traditional profiling tools is difficult because of the scale and complexity of the dynamic behaviour being exhibited. A wealth of detailed data is collected but costs are thinly distributed across thousands of methods leaving few easily identifiable performance optimisation opportunities. However we believe there are repeated patterns of method calls hidden within the profile data that represent performance critical sections of code. We plan to investigate new approaches to analysing typical performance data sets to identify these repeated patterns. Our initial work shows some promising results - being able to identify, within an application with over 64 thousand unique calling contexts, ten patterns that account for over 50% of the execution time.
Quantitative Properties of Software Systems: Specification, Verification, and Synthesis
Srđan Krstić
Politecnico di Milano, Italy
Functional and non-functional requirements are becoming more and more complex, introducing ambiguities in the natural language specifications. A very broad class of such requirements are the ones that define quantitative properties of software systems. Properties of this kind are of key relevance to express quality of service. For example, they are used to specify bounds on the timing information between specific events, or on their number of occurrences. Sometimes, they are also used to express higher level properties such as aggregate values over the multiplicity of certain events in a specific time window. These are practical specification patterns that can be frequently found in system documentation. The goal of this thesis is to develop an approach for specifying and verifying quantitative properties of complex software systems that execute in a changing environment. In addition, it will also explore synthesis techniques that can be applied to infer such type of properties from execution traces.
ReuseSEEM: An Approach to Support the Definition, Modeling, and Analysis of Software Ecosystems
Rodrigo Pereira dos Santos
COPPE, Brazil; Federal University of Rio de Janeiro, Brazil
Software Engineering (SE) community has discussed economic and social issues as a challenge for the next years. Companies and organizations have directly (or not) opened up their software platforms and assets to others, including partners and 3rd party developers, creating software ecosystems (SECOs). This scenario changes the traditional software industry because it requires mature research in SE dealing with an environment where business models and socio-technical networks can impact systems engineering and management, and reuse approaches. However, one strong inhibitor is the complexity in defining and modeling SECO elements to improve their comprehension and analysis. The main reason is the fact that this topic is emerging and no common sense on its concepts and relations exists yet. Thus, it is difficult to understand its real impacts in the SE industry. In this context, we propose an approach to support the definition, modeling and analysis of SECOs by exploring Software Reuse concepts in techniques in this area and treating nontechnical aspects in SE.
Study of Task Processes for Improving Programmer Productivity
Damodaram Kamma
IIIT Delhi, India
In a mature overall process of software development, productivity of a software project considerably depends on the effectiveness with which programmers execute tasks. A task process refers to the processes used by a programmer for executing an assigned task. This research focuses on studying the effect of task processes on programmer productivity. Our approach first identifies high productivity and average productivity programmers, then understands the task processes used by the two groups, the similarities between the task processes used by programmers within a group, and differences between the task processes in the two groups. This study is part of an ongoing study being conducted at a CMMi Level 5 software company. The results so far indicate that there are differences in task processes followed by high and average productivity programmers, and that it may be possible to improve the productivity of average productivity programmers by training them to use the task processes followed by the high productivity programmers.
Summarization of Complex Software Artifacts
Laura Moreno
Wayne State University, USA
Program understanding is necessary for most software engineering tasks. Internal and external documentation help during this process. Unfortunately, this documentation is often missing or outdated. An alternative to solve this situation is automatically summarizing software artifacts. In the case of source code, a few approaches have been proposed to generate natural language descriptions of fine-grained elements of the code. This research focuses on the automatic generation of generic natural language summaries of complex code artifacts, such as, classes and change sets. In addition, these generic summaries will be adapted to support specific maintenance tasks.
Supporting Evolution and Maintenance of Android Apps
Mario Linares-Vásquez
College of William and Mary, USA
In recent years, the market of mobile software applications (apps) has maintained an impressive upward trajectory. As of today, the market for such devices features over 850K+ apps for Android, and 19 versions of the Android API have been released in 4 years. There is evidence that Android apps are highly dependent on the underlying APIs, and APIs instability (change proneness) and fault-proneness are a threat to the success of those apps. Therefore, the goal of this research is to create an approach that helps developers of Android apps to be better prepared for Android platform updates as well as the updates from third-party libraries that can potentially (and inadvertently) impact their apps with breaking changes and bugs. Thus, we hypothesize that the proposed approach will help developers not only deal with platform and library updates opportunely, but also keep (and increase) the user base by avoiding many of these potential API ”update” bugs
Preprint Available
Understanding the Dynamics of Test-Driven Development
Davide Fucci
University of Oulu, Finland
Test-driven development (TDD) has been the subject of several software engineering experiments. However the controversial results about its effects still need to be contextualized. This doctoral research will show how TDD could be better assessed by studying to what extent developers follow its cycle and for what kind of development tasks. This knowledge is foreseen to be beneficial for software industries willing to adopt or adapt TDD.
Understanding the Redundancy of Software Systems
Andrea Mattavelli
University of Lugano, Switzerland
Our research aims to study and characterize the redundancy of software systems. Intuitively, a software is redundant when it can perform the same functionality in different ways. Researches have successfully defined several techniques that exploit various form of redundancy, for example for tolerating failures at runtime and for testing purposes. We aim to formalize and study the redundancy of software systems in general. In particular, we are interested in the intrinsic redundancy of software systems, that is a form of undocumented redundancy present in software systems as consequence of various design and implementation decisions. In this thesis we will formalize the intuitive notion of redundancy. On the basis of such formalization, we will investigate the pervasiveness and the fundamental characteristics of the intrinsic redundancy of software systems. We will study the nature, the origin, and various forms of such redundancy. We will also develop techniques to automatically identify the intrinsic redundancy of software systems.
Preprint Available
Verifying Incomplete and Evolving Specifications
Claudio Menghi
Politecnico di Milano, Italy
Classical verification techniques rely on the assumption that the model of the system under analysis is completely specified and does not change over time. However, most modern development life-cycles and even run-time environments (as in the case of adaptive systems), are implicitly based on incompleteness and evolution. Incompleteness occurs when some parts of the system are not specified. Evolution concerns a set of gradual and progressive changes that amend systems over time. Modern development life-cycles are founded on a sequence of iterative and incremental steps through which the initial incomplete description of the system evolves into its final, fully detailed, specification. Similarly, adaptive systems evolve through a set of adaptation actions, such as plugging and removing components, that modify the behavior of the system in response to new environmental conditions, requirements or legal regulations. Usually, the adaptation is performed by first removing old components, leaving the system temporarily unspecified-incomplete-, and then by plugging the new ones. This work aims to extend classical verification algorithms to consider incomplete and evolving specifications. We want to ensure that after any change, only the part of the system that is affected by the change, is re-analyzed, avoiding to re-verify everything from scratch.
APISynth: A New Graph-Based API Recommender System
Chen Lv, Wei Jiang, Yue Liu, and Songlin Hu
University of Chinese Academy of Sciences, China; Institute of Computing Technology at Chinese Academy of Sciences, China; Greatwall Drilling Company, China
Current API recommendation tools yield either good recall ratio or good precision, but not both. A tool named APISynth is proposed in this paper by utilizing a new graph based approach. Preliminary evaluation demonstrates that APISynth wins over the state of the art with respect to both the two criteria.
An Adaptive Bayesian Approach for URL Selection to Test Performance of Large Scale Web-Based Systems
Alim Ul Gias and Kazi Sakib
University of Dhaka, Bangladesh
In case of large scale web-based systems, scripts for performance testing are updated iteratively. In each script, multiple URLs of the system are considered depending on intuitions that those URLs will expose the performance bugs. This paper proposes a Bayesian approach for including a URL to a test script based on its probability of being time intensive. As the testing goes on the scheme adaptively updates its knowledge regarding a URL. The comparison with existing methods shows that the proposed technique performs similar in guiding applications towards intensive tasks, which helps to expose performance bugs.
Preprint Available
An Optimized Design Approach for Extending HMI Systems with Mobile Devices
Manasvi Jain, Rahul Raj CP, and Seshubabu Tolety
Siemens, India
Remote monitoring and controlling of industrial machines have proven to be a necessary requirement for many engineering domains. HMI panels are already successful in providing proper control for such machines/ layouts. Many organizations are now utilizing new and trendy smart phones to access their legacy systems remotely. In this paper, we elicit a viable approach for extending HMI systems with the smart phones and tablets. This approach overcomes the challenges of explicit mobile application design approaches and provides appropriate application architecture for mobile extension providers.
Assuring System Goals under Uncertainty with Active Formal Models of Self-Adaptation
M. Usman Iftikhar and Danny Weyns
Linnaeus University, Sweden
Designing software systems with uncertainties, such as incomplete knowledge about changing system goals, is challenging. One approach to handle uncertainties is self-adaptation, where a system consists of a managed system and a managing system that realizes a feedback loop. The promise of self-adaptation is to enable a system to adapt itself realizing the system goals, regarding uncertainties. To realize this promise it is critical to provide assurances for the self-adaptive behaviours. Several approaches have been proposed that exploit formal methods to provide these assurances. However, an integrated approach that combines: (1) seamless integration of offline and online verification (to deal with inherent limitations of verification), with (2) support for runtime evolution of the system (to deal with new or changing goals) is lacking. In this paper, we outline a new approach named Active FORmal Models of Self-adaptation (ActivFORMS) that aims to deal with these challenges. In ActivFORMS, the formal models of the managing system are directly deployed and executed to realize self-adaptation, guaranteeing the verified properties. Having the formal models readily available at runtime paves the way for: (1) incremental verification during system execution, and (2) runtime evolution of the self-adaptive system. Experiences with a robotic system show promising results.
Asymmetric Software Structures in the Linux Kernel
Lei Wang, Ping Wang, and Zhen Wang
Beihang University, China
We investigated the asymmetry in the structure of complex software. After studying the degree distribution of the call graphs corresponding to the Linux kernel modules of 223 different versions, we found the asymmetry between the in-degree and out-degree distributions. After analyzing the behaviors of the newly added nodes in each version, we found that the preferential attachment behaviors of the new nodes are not only related with the degree of nodes, also related with the "age" of nodes, especially in the out-degree. In addition, the new nodes tend to cluster in Linux kernel.
Avoiding Deadlocks using Stalemate and Dimmunix
Surabhi Pandey, Sushanth Bhat, and Vivek Shanbhag
IIIT Bangalore, India
The execution of a concurrent Java program can deadlock if its threads attempt to acquire shared locks in cyclic order. The JVM permits such behaviour. Research has demonstrated that such deadlocks can be predicted through static analysis. It is also known that a tool like Dimmunix helps to avoid deadlocks whose deadlock patterns (fingerprints) are known. The current work combines both approaches: conducting static analysis to predict possible deadlocks and provide their corresponding fingerprints to Dimmunix. These fingerprints forewarn Dimmunix of all deadlock possibilities rather than it learn about them one at a time. For our experiments we use 8 deadlock programs that were developed based upon deadlock predictions from static analysis of the entire JRE by a tool called Stalemate. We design a process to generate Dimmunix fingerprints from deadlock predictions.
Calibrating Use Case Points
Ali Bou Nassif, Luiz Fernando Capretz, and Danny Ho
University of Western Ontario, Canada; NFA Estimation, Canada
An approach to calibrate the complexity weights of the use cases in the Use Case Points (UCP) model is put forward. The size metric used is the Use Case Points (UCP) which can be calculated from the use case diagram along with its use case scenario as described in the UCP model. The approach uses a neural network with fuzzy logic to tune the complexity weights.
DEECo: An Ecosystem for Cyber-Physical Systems
Rima Al Ali, Tomas Bures, Ilias Gerostathopoulos, Petr Hnetynka, Jaroslav Keznikl, Michal Kit, and Frantisek Plasil
Charles University, Czech Republic
In this work we tackle the problem of designing and developing software-intensive cyber-physical systems (CPS), which are large distributed systems of collaborating elements that closely interact with the physical world, such as intelligent transportation systems and crowdsourcing applications. Due to their specific constraints, such as extreme dynamism and continuous evolution of the physical substratum, and requirements, such us open-endedness and adaptability, CPS introduce many new challenges for software engineering. In response, we present a tailored ecosystem of software engineering models, methods, and tools. This ecosystem is centered on the DEECo component model, which we have proposed specifically for architecting software-intensive CPS.
Preprint Available
Additional Information
Fault Localization for Build Code Errors in Makefiles
Jafar Al-Kofahi, Hung Viet Nguyen, and Tien N. Nguyen
Iowa State University, USA
Building is an important process in software development. In large software projects, build code has a high level of complexity, churn rate, and defect proneness. While several automated approaches exist to help developers in localizing faults in traditional source code and in detecting code smells in build code, fault localization techniques have not yet been developed for build code. In this work, we introduce MkFault, a tool to localize errors resulting in build crashes. MkFault monitors the execution traces from GNU Make statements that produce concrete build rules and the original code locations for each component of a rule (i.e., target, prerequisites, and recipe). It then uses a novel ranking algorithm to give suspiciousness scores to the original statements in the Makefile. In our empirical evaluation with real faults, we show that MkFault can help localize faults in Make code with high accuracy.
Hybrid Test Data Generation
Zicong Liu, Zhenyu Chen, Chunrong Fang, and Qingkai Shi
Nanjing University, China
Many automatic test data generation techniques have been proposed in the past decades. Each technique can only deal with very restrictive data types so far. This limits the usefulness of test data generation in practice. We present a preliminary approach on hybrid test data generation, by combining Random Strategy (RS), Dynamic Symbolic Execution (DSE), and Search-based Strategy (SBS). It is expected to take advantage of the state-of-the-arts to enhance the robustness and scalability, in terms of different types of test data.
Model-Driven Development of Diverse User Interfaces
Zhiyi Ma, Wei Zhang, and Chih-Yi Yeh
Peking University, China
Developing and maintaining user interfaces of an application for various devices is usually laborious. This paper discusses how to build diverse user interfaces based on model-driven development.
Modeling and Model Checking by Modular Approach
Mo Xia, Guiming Luo, and Mian Sun
Tsinghua University, China
Model checking is a common formal verification technique, but it is only applicable to white box systems. In order to allow users without much formal verification expertise to use model checking easily, this paper proposes a modular approach for software modeling and model checking. Efficiency, correctness, and reusability are our main concerns. A hierarchical model is constructed for a system by modules, and it is translated into the specific model checking codes. The M^3C tool is implemented to support our approach, and it is successfully applied to actual industrial cases, as well as to some cases in the literature.
Proposing a Theory of Gamification Effectiveness
Bilal Amir and Paul Ralph
Sur University College, Oman; Lancaster University, UK
Gamification informally refers to making a system more game-like. More specifically, gamification denotes applying game mechanics to a non-game system. We theorize that gamification success depends on the game mechanics employed and their effects on user motivation and immersion. The proposed theory may be tested using an experiment or questionnaire study.
Preprint Available
Shedding Light on Distributed System Executions
Jenny Abrahamson, Ivan Beschastnikh, Yuriy Brun, and Michael D. Ernst
Facebook, USA; University of British Columbia, Canada; University of Massachusetts, USA; University of Washington, USA
In a distributed system, the hosts execute concurrently, generating asynchronous logs that are challenging to comprehend. We present two tools: ShiVector to transparently add vector timestamps to distributed system logs, and ShiViz to help developers understand distributed system logs by visualizing them as space-time diagrams. ShiVector is the first tool to offer automated vector timestamp instrumentation without modifying source code. The vector-timestamped logs capture partial ordering information, useful for analysis and comprehension. ShiViz space-time diagrams are simple to understand and interactive — the user can explore the log through the visualization to understand complex system behavior. We applied ShiVector and ShiViz to two systems and found that they aid developers in understanding and debugging.
Preprint Available
Additional Information
Software Defect Prediction Based on Collaborative Representation Classification
Xiao-Yuan Jing, Zhi-Wu Zhang, Shi Ying, Feng Wang, and Yang-Ping Zhu
Wuhan University, China; Nanjing University of Posts and Telecommunications, China
In recent years, machine learning techniques have been successfully applied into software defect prediction. Although they can yield reasonably good prediction results, there still exists much room for improvement on the aspect of prediction accuracy. Sparse representation is one of the most advanced machine learning techniques. It performs well with respect to signal compression and classification, but suffers from its time-consuming sparse coding. Compared with sparse representation, collaborative representation classification (CRC) can yield significantly lower computational complexity and competitive classification performance in pattern recognition domains. To achieve better defect prediction results, we introduce the CRC technique in this paper and propose a CRC based software defect prediction (CSDP) approach. We first design a CRC based learner to build a prediction model, whose computational burden is low. Then, we design a CRC based predictor to classify whether the query software modules are defective or defective-free. Experimental results on the widely used NASA datasets demonstrate the effectiveness and efficiency of the proposed approach.
Statistical Learning of API Mappings for Language Migration
Anh Tuan Nguyen, Hoan Anh Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen
Iowa State University, USA; Utah State University, USA
The process of migrating software between languages is called language migration or code migration. To reduce manual effort in defining the rules of API mappings for code migration, we propose StaMiner, a data-driven model that statistically learns the mappings between API usages from the corpus of the corresponding methods in the client code of the APIs in two languages.
The MechatronicUML Method: Model-Driven Software Engineering of Self-Adaptive Mechatronic Systems
Steffen Becker, Stefan Dziwok, Christopher Gerking, Christian Heinzemann, Wilhelm Schäfer, Matthias Meyer, and Uwe Pohlmann
University of Paderborn, Germany; Fraunhofer IPT, Germany
The software of mechatronic systems interacts with the system's physical environment. In such systems, an incorrect software may cause harm to human life. As a consequence, software engineering methods for developing such software need to enable developers to effectively and efficiently proof their correctness. This is further complicated by additional characteristics of mechatronic systems as self-adaptation and coordination with other systems. In this poster, we present MechatronicUML which is a model-driven software engineering method that especially considers these characteristics of self-adaptive mechatronic systems.
Timing Challenges in Automotive Software Architectures
Licong Zhang, Reinhard Schneider, Alejandro Masrur, Martin Becker, Martin Geier, and Samarjit Chakraborty
TU München, Germany; TU Chemnitz, Germany
Most of the innovation in the automotive domain is now in electronics and software, which has led to several million lines of code in today's high-end cars. However, in contrast to software in the general purpose computing domain -- where mostly functional correctness is of concern -- timing predictability of automotive software is an important problem which is still largely unsolved. More importantly, this problem is solely addressed within the embedded systems domain with little or no participation from the mainstream software engineering community. The goal of this poster is to highlight some of the aspects of timing analysis of automotive software, as an attempt to involve the broader software engineering research community in this problem.
Towards Designing Assistive Software Applications for Discrete Trial Training
Valerie Picardo, Samuel Metson, Rashina Hoda, Robert Amor, Angela Arnold-Saritepe, Rebecca Sharp, and Denys Brand
University of Auckland, New Zealand
Discrete Trial Training (DTT) is one of the most effective training methods for children diagnosed with Autism. Traditional DTT suffers from limitations of inconsistencies on account of human error, disruptions due to in-session data collection by trainers, and difficulties of producing physical within-stimulus prompts. Current software solutions either support sole child usage thereby eliminating the social interaction benefits of DTT or lack automated data collection. Designed by an inter-disciplinary team of software engineers, HCI, and psychology experts and certified behaviour analysts for a touch-tabletop, DTTAce is an assistive-software that provides digital consistency and integrity and supports customization of trials, automated data collection, and within-stimulus prompts while preserving natural interactions and the social nature of DTT. It is an important step towards designing effective assistive software for Discrete Trial Training.
Preprint Available
Automatic Performance Modeling of Multithreaded Programs
Alexander Tarvo
Brown University, USA
Multithreaded programs express a complex non-linear dependency between their configuration and the performance. To better understand this dependency performance prediction models are used. However, building performance models manually is time-consuming and error-prone. We present a novel methodology for automatically building performance models of industrial multithreaded programs.
Characteristics of the Vulnerable Code Changes Identified through Peer Code Review
Amiangshu Bosu
University of Alabama, USA
To effectively utilize the efforts of scarce security experts, this study aims to provide empirical evidence about the characteristics of security vulnerabilities. Using a three-stage, manual analysis of peer code review data from 10 popular Open Source Software (OSS) projects, this study identified 413 potentially vulnerable code changes (VCC). Some key results include: 1) the most experienced contributors authored the majority of the VCCs, 2) while less experienced authors wrote fewer VCCs, their code changes were 1.5 to 24 times more likely to be vulnerable, 3) employees of the organization sponsoring the OSS projects are more likely to write VCCs.
Preprint Available
Additional Information
Exception Handling for Dynamic Information Flow Control
Abhishek Bichhawat
Saarland University, Germany
Exceptions are a source of information leaks, which are difficult to handle as they allow for non-local control transfer. Existing dynamic information flow control techniques either ignore unstructured control flow or are restrictive. This work presents a more permissive solution for controlling information leaks using program analysis techniques.
Exploiting Undefined Behaviors for Efficient Symbolic Execution
Asankhaya Sharma
National University of Singapore, Singapore
Symbolic execution is an important and popular technique used in several software engineering tools for test case generation, debugging and program analysis. As such improving the performance of symbolic execution can have huge impact on the effectiveness of such tools. In this paper, we present a technique to systematically introduce undefined behaviors during compilation to speed up the subsequent symbolic execution of the program. We have implemented our technique inside LLVM and tested with an existing symbolic execution engine (Pathgrind). Preliminary results on the SIR repository benchmark are encouraging and show 48% speed up in time and 30% reduction in the number of constraints.
Preprint Available
Additional Information
Identifying Caching Opportunities, Effortlessly
Alejandro Infante
University of Chile, Chile
Memory consumption is a great concern for most non trivial software. In this paper we introduce a dedicated code profiler that identifies opportunities to reduce memory consumption by introducing caches.
Incremental Reachability Checking of KernelC Programs using Matching Logic
Alessandro Maria Rizzi
Politecnico di Milano, Italy
A fundamental phase in software development deals with verifying that software behaves correctly. Although accurate testing can discover many wrong behaviours, formal software verification techniques can help in developing applications that dependably satisfy their requirements. However, since formal verification techniques are time consuming and software changes continuously, incremental verification methods, i.e., methods which reuse the results of the verification of a previous version when verifying a new version of a program, are very useful, since they can significantly reduce the time required to perform the verification. In this work I apply a syntactic-semantic incremental approach to reachability checking of KernelC programs using matching logic. KernelC is a significant, non-trivial subset of the C programming language. Matching logic is a language-independent proof system to reason about programs in any language that has a rewrite-based operational semantics. Incrementality is achieved by encoding the verification procedure in a syntax-driven fashion based on semantic attributes defined on top of an operator-precedence grammar.
Privacy and Security Requirements Framework for the Internet of Things (IoT)
Israa Alqassem
Masdar Institute of Science and Technology, United Arab Emirates
Capturing privacy and security requirements in the very early stages is essential for creating sufficient public confidence in order to facilitate the adaption of novel systems such as the Internet of Things (IoT). However, traditional requirements engineering methods and frameworks might not be sufficiently effective when dealing with new types of IoT heterogeneous systems. Therefore, building a methodological framework to model the privacy and security requirements specifications for IoT is necessary in order to deal with its mission critical nature. The purpose of this project is to develop such a requirements engineering framework in order to ensure proper development of IoT with security and privacy taken into account from the earliest stages.
Program Transformations to Fix C Buffer Overflow
Alex Shaw
Auburn University, USA
This paper describes two program transformations to fix buffer overflows originating from unsafe library functions and bad pointer operations. Together, these transformations fixed all buffer overflows featured in 4,505 programs of NIST’s SAMATE reference dataset, making the changes automatically on over 2.3 million lines of C code.