Prediction Tools for Software Development

At the PROMISE 2011 conference in Banff, Ayse Bener of Ryerson University organized a panel on the future of predictive modelling in software development. Such models are used for quality, productivity, schedule, cost, or profitability estimation and rely on rich historical data, both code- and non-code related, to be successul. Setting them up, collecting the data, and applying the models require much expertise and effort. The economics work only if the results can be acted upon profitably to manage risk, set priorities, and allocate resources. The savings generated must surpass the cost of applying these models. Thus the feasibility of predictive modelling hinges partly on level of automation. That’s where tool support comes into play. As a panel member,  I was asked to comment on this aspect of predictive modelling, focusing on the adoption issue. Specifically, what do we need for widespread tool adoption in this context?

I use the term prediction tool broadly to refer to any implementation of a predictive model or prediction technique, algorithm, or heuristic. I include in the definition any facilities that perform essential input functions such as data identification, extraction, and sanitization, as well as essential output functions such as reporting, summarization and presentation.

Types of Tools

Use of prediction tools is very rare, although we know of several successful examples. I’ll contrast two typical manifestations of such tools:

  • A: stand-alone, one-of-a-kind, intermittently applied facilities (usually collection of scripts and small applications) often intended for use by experts who have developed or commissioned them, and
  • B: stable, deployed components, applications, or services that are part of an existing software development environment and meant for regular use in software teams.

Manifestations that are a mixture or fall somewhere in between exist, but they won’t be useful for making my point.

Tools of type A are most common. Almost all of the noteworthy examples I know of fall under this category. They are relatively cheap to develop, but each application requires expert hands. They are meant to be used by a small minority of very select people.  That alone pretty much explains their home base of large organizations with dedicated metrics, research, or process improvement departments. Indeed, at a session of the International Software Engineering Research Network‘s annual meeting this year (which happened to collocate with PROMISE), we have seen some pretty nifty examples from ABB and Avaya Labs. Type-A tools are thus fine for a large organization where the cost of the required expertise can be amortized over big, corporation-wide initiatives, however they are too resource-intensive for broader adoption.

The alternative to this “do your magic and throw the results over the bench” approach is type-B tools, in which higher up-front development costs by those handful of experts can be amortized across many organizations over more frequent use by multiple roles in software teams themselves. The scope of each use may however be small. This attractive alternative doesn’t necessarily imply that no fine-tuning, hand-holding, or maintenance will ever be needed, but it does imply making information more available and friendly to software teams and decision makers, thus giving them more control over their projects and achieving better visibility.

So let’s limit the scope to type-B tools, say because I’d like predictive modelling to be more readily available to mere mortals. What factors should we look at next to increase the chances of adoption? I’ve organized these factors in two dimensions:

  • factors specific to the goals of prediction, and
  • factors that encourage tool adoption in general.

Specific Adoption Factors

Predication does not happen in a void. We do it for a purpose. Among factors specific to the goals of prediction are performance traceability, easy calibration, and ability to support decision making.

If we are making predictions about cost, schedule, quality, productivity, or profitability, at one point we will probably know the actuals of the predicted quantities.  Performance traceabiltiy refers to the capability to track and record those actuals when they materialize, and then compare them with the original predictions. Such capability completes the tool’s feedback loop and gives it  the much needed visibility for continued use. If the tool is performant and this can be proven, the incentive to keep using the tool increases.

The next step up is calibration, well, more precisely, easy calibration. Easy calibration is giving the tool the capability to tune and re-tune the underlying predictive models when new data is available or when existing data becomes obsolete. Ideally, we would like such calibration to occur both incrementally and automatically. Hence the adjective “easy”. Performance traceability is a prerequisite of easy calibration.

The final step up is the ability to support decision making. Think of decision support ability  as “going the last mile,” in that we’d like the prediction tools, or the outputs thereof,  to be both purposeful and actionable. A tool is purposeful if it supports concrete goals for specific goals in a software team. A tool is actionable  if helps the user to take a growth, improvement, corrective, or preventive action. Here is an example of purposefulness, expressed as a behavioural requirement, from Brian Robinson’s presentation at ISERN:

As a Product Manager, I need estimates to include the complete cost of new features including post-release maintenance so that NPV forecasts are more accurate.

Something that is both purposeful and actionable is:

As a Testing Manager, I want defect predictions to rank modules in order of pre-release defect risk, so that I can allocate testing resources optimally.

The Testing Manager is asking more than just defect density estimates. She’s asking for information and advice that will directly help her ration scarce sources. One way of improving a tool’s decision-making ability is to use a methodology such as GQM (Goal-Question-Metric) when designing the tool. Another is to provide intelligent, role-specific filters and workflows. Role-specific filters will slice and dice the information in different ways, each filter customizing dashboards, reports, charts, and advice with a single perspective in mind. A role-specific workflow will mimic a team member’s implicit decision-making or sense-making (thanks to Mika Mantyla for the term) processes.

General Adoption Factors

As for factors that facilitate tool adoption in general in the software development context, I’ll briefly invoke the relevant dimensions of an assessment method called WeightPrints that Piotr Kaminski and I developed many years ago. Piotr is a savvy tool developer, who has been making his living creating tools for Google developers. The basic idea was to help developers and researchers gauge how heavy-weight a software development tool is for a given purpose. Our grand assumption was that lighter-weight tools were easier to adopt than heavier-weight tools for the same purpose. Thus, all else being equal, a tool with a low weightprint would be preferable to a tool with a high weightprint.

The four dimensions of the WeightPrints model that are applicable in the prediction tool space are obtrusiveness, management burden, ramp-up, and isolation. The lower a tool scores in each of the dimensions, the lower its weightprint is. Obtrusiveness increases as a tool’s input gluttony (requiring lots of manual input), conspicuousness, disruption to existing workflow, and passivity increases.  Management burden increases with installation, configuration, and healing effort as well as with the tool’s disregard for standards and its total installed footprint. Ramp-up increases as learning effort and the dependence of the tool’s success on network effects (its adoption scope) increase while its virality (its ability to spread fast), synergy with other tools, and frequency of use decrease. Finally, isolation decreases with the tool’s compatibility, interoperability, extensibility, and openness (especially, in terms of the underlying data exchange formats). But enough about WeightPrints, for I can always discuss it further in a future post.

Some kinds of tools, like prediction tools have good reasons to be on the heavy-weight side of the spectrum. However, we should strive to make them lighter and lighter if we care about their adoption by end-users.

Acknowledgments: I’d like to thank the organizer Ayse Bener and the other panel members Gunther Ruhe, Mika Mantyla, Barbara Russo, and Burak Turhan.

Comments are closed.