Archive for the 'testing' Category

Mocking and exploratory testing

Mock object seem to require more whole-product (or otherwise large-scale) automated tests. I’m hoping not, especially if manual exploratory has some tool support. My reasons are below the jump.

(more…)

Software Craftsmanship mini-conference (London, Feb 26, 2009)

I’m on the review committee of Jason Gorman’s Software Craftsmanship mini-conference. If I can overlap it with a trip to France, I’ll definitely be there. Here’s the blurb:

This is a conference about the “hard skills” that programmers and teams require to deliver high quality working software.

From writing effective unit tests to managing dependencies, and from writing reliable multi-threaded code to building robust and dependable service-oriented architectures.

This conference is all about the principles and practices, and the disciplines and habits, that distinguish the best 10% of software professionals from the 90% who are failing their customers and failing their profession by taking no care or pride in their work and delivering buggy, unreliable and unmaintainable code.

This conference aims to showcase and champion the “hard skills”, and champion the idea of software craftsmanship and the ways in which it can be encouraged and supported in the workplace, in schools and colleges, and among the wider software development community.

This is a conference about building it right.

Barriers to acceptance-test driven design

At the AA Functional Test Tools workshop, we had a little session devoted to this question: Even where “ordinary” unit test-driven design (UTTD) works well, acceptance-test driven design (ATDD) is having more trouble getting traction. Why not?

My notes:

  1. Programmers miss the fun / aha! moments / benefits that they get from UTDD.

    1. Especially, there is a difference in scope and cadence of tests. (”Cadence” became a key word people kept coming back to.)
    2. Laborious fixturing, which doesn’t feel as valuable as “real programming”.
    3. No insight into structure of system.
  2. Business people don’t see the value (or ROI) from ATDD

    1. there’s not value for them personally (as perhaps opposed to the business)
    2. they are not used to working at that level of precision
    3. no time
    4. they prefer rules to examples
    5. tests are not replacing traditional specs, so they’re extra work.
  3. There is no “analyst type” or tester/analyst to do the work.

  4. There is an analyst type, but their separate existence (from programmers) leads to separate tools and hence general weakness, lack of coordination

  5. There’s no process/technique for doing ATDD, not like the one for UTDD.

  6. ATDD requires much more collaboration than UTDD (because the required knowledge and skills are dispersed among several people), but it is more fragile (because the benefit is distributed - perhaps unevenly - among those people).

  7. Programmers can be overloaded with masses of analyst- or tester-generated examples. The analyst or testers need to be viewed as teachers, teaching the programmers what they need to know to make right programming decisions. That means sequences of tests that teach, moving from simple-and-illustrative, to more complicated, with interesting-and-illuminating diversions along the way, etc.

Position statement for functional testing tools workshop

Automated functional testing lives between two sensible testing activities. On the one side, there’s conventional TDD (unit testing). On the other side, there’s manual exploratory testing. It is probably more important to get good at those than it is to get good at automated functional testing. Once you’ve gotten good at them, what does it mean to get good at automated functional testing?

There is some value in thinking through larger-scale issues (such as workflows or system states) before diving into unit testing. There is some value (but not, I think, as much as most people think) in being able to rerun larger-scale functional tests easily. In sum: compared to doing exploratory testing and TDD right, the testing we’re talking about has modest value. Right now, the cost is more than modest, to the point where I question whether a lot of projects are really getting adequate ROI. I see projects pouring resources into functional testing not because they really value it but more because they know they should value it.

This is strikingly similar to, well, the way that automated testing worked in the pre-Agile era: most often a triumph of hope over experience.

My bet is that the point of maximum leverage is in reducing the cost of larger-scale testing (not in improving its value). Right now, all those workflow statements and checks that are so easy to write down are are annoyingly hard to implement. Even I, staring at a workflow test, get depressed at how much work it will be to get it just to the point where it fails for the first time, compared to all the other things I could be doing with my time.

Why does test implementation cost so much?

We are taught that Agile development is about working the code base so that arbitrary new requirements are easy to implement. We have learned one cannot accomplish that by “layering” new features onto an existing core. Instead, the core has to be continually massaged so that, at any given moment, it appears as if it were carefully designed to satisfy the features it supports. Over time, that continual massaging results in a core that invites new features because it’s positively poised to change.

What do we do when we write test support code for automated large-scale tests? We layer it on top of the system (either on top of the GUI or on top of some layer below the GUI). We do not work the new code into the existing core—so, in a way that ought not to surprise us, it never gets easier to add tests.

So the problem is to work the test code into the core. The way I propose to do that is to take exploratory testing more seriously: treat it as a legitimate source of user stories we handle just like other user stories. For example, if an exploratory tester wants an “undo” feature for a webapp, implementing it will have real architectural consequences (such as moving from an architecture where HTTP requests call action methods that “fire and forget” HTML to one where requests create Command objects).

Why drive the code with exploratory testing stories rather than functional testing stories? I’m not sure. It feels right to me for several nebulous reasons I won’t try to explain here.

Functional testing tools workshop just before Agile 2008


Agile Alliance Functional Testing Tools Open Space Workshop
Call for Participation

Dates: Monday, August 4, 2008
Times: 8 AM - 6 PM
Location: Toronto, Ontario, at the Agile2008 venue

Description

This is the second Agile Alliance Functional Testing Tools workshop.
The first, held in October 2007 in Portland Oregon, was a great
success. In this second workshop, we're increasing the size and
moving to an open space like format. The primary purpose of this
workshop is still to discuss cutting-edge advancements in and envision
possibilities for the future of automated functional testing tools.

As an open-space style workshop, the content comes from the
participants, and we expect all participants to take an active role.
We're seeking participants who have interest and experience in
creating and/or using automated functional testing tools/frameworks on
Agile projects.

This workshop is sponsored by the Agile Alliance Functional Testing
Tools Program. The mission of this program is to advance the state of
the art of automated functional testing tools used by Agile teams to
automate customer-facing tests.

There is no cost to participate. Participants will be responsible for
their own travel expenses.

Due to room constraints, we can accommodate up to 60 participants.
Registrations will be granted on a first-come, first-served basis to
participants who complete the registration process.

Registering for the AA-FTT Open Space Workshop

We will be using the conference submission system
(http://submissions.agile2008.org) to process the requests for
invitation (RFI). If you're interested in being invited to
participate in this workshop, please:
a) login to the submission system (create an account if you don't have
one already). NOTE: make sure your email address is correct.
b) click the 'propose a session' link to request an invitation,
filling in the following required fields:
- title: enter RFI 
- stage: select ‘AAFTT’
- session type: select ‘other’
- duration: select any of the values (not relevant for the RFI process)
- summary: briefly answer the following three questions
i) What do you see as the biggest issue for Functional
Testing Tools on Agile projects?
ii) What do you hope to contribute?
iii) What do you hope to get?
c) click ‘create’

The AAFTT stage producers will review the RFI, and send you an
invitation to attend the workshop, along with further instructions for
pre-organizing openspace sessions.

Please register as soon as possible, before the workshop fills up.

Pass This Along
If you know of someone that would be a candidate for this workshop,
please forward this call for participation on to them.

Conference of the Association of Software Testing 2008

The third annual Conference of the Association of Software Testing (CAST) 2008 in Toronto, July 14-16. Early bird registration ends May 30. Here’s what Michael Bolton has written about it:

A colleague recently pointed out that an important mission of our community is to remind people–and ourselves–that testing doesn’t have to suck.

Well, neither do testing conferences. CAST 2008 is the kind of conference that I’ve always wanted to attend. The theme is “Beyond the Boundaries: Interdisciplinary Approaches to Software Testing”, and the program is incredibly eclectic and diverse. Start with the keynotes: Jerry Weinberg on Lessons from the Past to Carry into the Future; Cem Kaner on The Value of Checklists and the Danger of Scripts: What Legal Training Suggests for Testers; Robert Sabourin (with Anne Sabourin) on Applied Testing Lessons from Delivery Room Labor Triage (there’s a related article in this month’s Better Software magazine); and Brian Fisher on The New Science of Visual Analytics. Track sessions include talks relating testing to improv theatre (Adam White), to music (yours truly and Jonathan Kohl), to finance and accounting (Doug Hoffman), to wargaming and Darwinian evolution (Bart Brokeman, author of /Testing Embedded Software/ and one of the co-authors of the /TMap Next/ book); to civil engineering (Scott Barber), to scientific software (Diane Kelly and Rebecca Sanders), to magic (Jeremy Kominar), to file systems (Morven Gentleman), and to data warehousing (Steve Richardson and Adam Geras), and to data visualization (Martin Taylor)… to four-year-olds playing lacrosse (Adam Goucher). There will be lightning talks and a tester competition. Jerry Weinberg will be doing a one-day tutorial workshop, as will Hung Nguyen, Scott Barber, and Julian Harty.

Yet another feature of the conference is that Jerry is launching his book on testing, /Perfect Software and Other Testing Myths/. I read an early version of it, and I’m waiting for it with bated breath. It’s a book that we’ll all want to read, and after we’re done, we’ll want to hand to people who are customers of testing. For some, we’ll want to tie them to a chair and /read it to them/.

The conference hotel is inexpensive, the food in Toronto is great, the nightlife is wonderful, the music is excellent…

More Information
================
You can find details on the program at http://www.cast2008.org/Program.

You can find information on the venue and logistics at http://www.cast2008.org/Venue.

Those from outside Canada should look at http://www.associationforsoftwaretesting.org/drupal/CAST2008/Venue#customs.

You can get registration information at http://www.cast2008.org/Registration.

Paying the Way
==============

If you need help persuading your company to send you to the conference, check out this: http://www.ayeconference.com/Articles/Mycompanywontpay.html.

And if all that fails, you can likely write off the cost of the conference against your taxes, even if you’re an employee. (I am not a tax professional, but INC magazine reports that you can write off expenses to “maintain or improve skills required in your present employment”. Americans should see IRS Publication 970 (http://www.irs.gov/publications/p970/ch12.html), Section 12, and ask your accountant!)

Come Along and Spread The Word!

===============================

So (if necessary) get your passports in order, take advantage of early bird registration (if you register in the next two weeks), and come join us. In addition (and I’m asking a favour here), please please /please/ tell your colleagues, both in your company and outside, about CAST. We want to share some great ideas on testing and other disciplines, and we want to make this the best CAST ever. And the event will only be improved by your presence.

So again, please spread the word, and come if you can.

Security mindset

A continual debate on the agile-testing mailing list is to what degree testers think differently than programmers and are therefore able to find bugs that programmers won’t. Without ever really resolving that question, the conversation usually moves onto whether the mental differences are innate or learnable.

I myself have no fixed opinion on the matter. That’s probably because, while vast numbers of testers are better than I am, I can imagine myself being like them and thinking like them. The same is true for programmers. In contrast, I simply can’t imagine being the sort of person who really cares who wins the World Cup or whether gay people I don’t know get married. (I’m not saying, “I couldn’t lower myself to their level” or anything stupid like that—I’m saying I can’t imagine what it would be like. It feels like trying to imagining what it is like to be a bat.)

However, I’ve long thought that security testers are a different breed, though I can’t articulate any way that they’re different in kind rather than degree. It’s just that the really good ones are transcendentally awesome at seeing how two disparate facts about a system can be combined and exploited. (A favorite example)

Bruce Schneier has an essay on security testers that I found interesting, though it doesn’t resolve any of my questions. Perhaps that’s because he said something I’ve been thinking for a while:

The designers are so busy making these systems work that they don’t stop to notice how they might fail or be made to fail, and then how those failures might be exploited. Teaching designers a security mindset will go a long way toward making future technological systems more secure.

The first sentence seems to make the second false. When I look back at the bugs I, acting as a programmer, fail to prevent and then fail to catch, an awful lot of the time their root cause wasn’t my knowledge. It’s that I have a compulsive personality and also habitually overcommit. As a result, there’s a lot of pressure to get done. The problem isn’t that I can’t flip into an adequate tester mindset, it’s that I don’t step back and take the time.

So, I suspect the interminable and seemingly irresolvable agile-testing debate should be shelved until we solve a more pressing problem: few teams have the discipline to adopt a sustainable pace, so few teams are even in a position to know if programmers could do as well as dedicated testers.

An alternative to business-facing TDD

The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing.

In 1998, I wrote a paper, “When should a test be automated?“, that sketched some economics behind automation. Crucially, I took the value of a test to be the bugs it found, rather than (as was common at the time) how many times it could be run in the time needed to step through it manually.

My conclusions looked roughly like the following:

test tradeoffs in general

Scripted tests, be they automated or manual, are expensive to create (first column). Manual scripts are cheaper, but they still require someone to write steps down carefully, and they likely require polishing before they can truly be followed by someone else. (Note: height of bars not based on actual data.)

In the second column, I assume that a particular set of steps has roughly the same chance of finding a bug whether executed manually or by a computer, and whether the steps were planned or chosen on the fly. (I say “roughly” because computers don’t get bored and miss bugs, but they also don’t notice bugs they weren’t instructed to find.)

Therefore, if the immediate value of a test is all that matters, exploratory manual testing is the right choice. What about long-term value?

Assume that exploratory tests are never intentionally repeated. Both their long-term cost and value are zero. Both kinds of scripted tests have quite substantial maintenance costs (especially in that era, when testing was typically done through an unmodified GUI). So, to pull ahead of exploratory tests in the long term, scripted tests must have substantial bug-finding power. Many people at that time observed that, in fact, most tests either found a bug the first time they were run or never found a bug at all. You were more likely to fix a test because of an intentional GUI change than to fix the code because the test found a bug.

So the answer to “when should a test be automated?” was “not very often”.

Programmer TDD changes the balance in two ways:

Test tradeoffs for TDD

  1. New sources of value are added. Extremely rapid feedback reduces the cost of debugging. (Most bugs strike while what you did to create them is fresh in your mind.) Many people find the steady pace of TDD allows them to go faster, and that the incremental growth of the code-under-test makes for easier design. And, most importantly as it turns out, the need to make tests run fast and reduce maintenance cost leads to designs with good properties like low coupling and high cohesion. (That is, properties that previously were considered good in the long term—but were routinely violated for short-term gain—now had powerful short-term benefits.)

  2. Good design and better programmer tools dramatically lowered the long-term cost of tests.

So, much to my surprise, the balance tipped in favor of automation—for programmer tests. It’s not surprising that many people, including me, hoped the balance could also tip for business-facing tests. Here are some of the hoped-for benefits:

  • Tests might clarify communication and avoid some cases where the business asks for something, the team thinks they’ve delivered it, and the business says “that’s not what I wanted.”

  • They might sharpen design thinking. The discipline of putting generalizations into concrete examples often does.

  • Programmers have learned that TDD supports iterative design of interfaces and behavior. Since whole products are also made of interfaces and behavior, they might also benefit from designers who react to partially-finished products rather than having to get it right up front.

  • Because businesses have learned to mistrust teams who show no visible progress for eight months (at which point, they ask for a slip), they might like to see evidence of continuous progress in the form of passing tests.

  • People often need documentation. Documentation is often improved by examples. Executable tests are examples. Tests as executable documentation might get two benefits for less than their separate costs.

  • And, oh yeah, tests could find regression bugs.

So a number of people launched off to explore this approach, most notably with Fit. But Fit hasn’t lived up to our hopes, I think. The things that particularly bother me about it are:

  • It works well for business logic that’s naturally tabular. But tables have proven awkward for other kinds of tests.

  • In part, the awkwardness is because there are no decent HTML table editors. That inhibits experimentation: if you don’t get a table format right the first time, you’re tempted to just leave it.

    Note: I haven’t tried ZiBreve. By now, I should have. I do include Word, Excel, and their OpenOffice equivalents among the ranks of the not-decent, at least if you want executable documentation. (I’ve never tried treating .doc files as the real tests that are “compiled” into HTML before they’re executed.)

  • Fit is not integrated into programmer editors the way xUnit is. For example, you can’t jump from a column name to the Java method that defines it. Partly for this reason, programmers tend to get impatient with people who invent new table formats—can’t they just get along with the old one?

With my graphical tests, I took aim at those sources of friction. If I have a workflow test, I can express it as boxes and arrows:

a workflow test

I translate the graphical documents into ordinary xUnit tests so that I can use my familiar tools while coding. The graphical editor is pretty decent, so I can readily change tests when I get better ideas. (There are occasional quirks where test content has changed more than it looks like it has. That aspect of using Fit hasn’t gone away entirely.)

I’ve been using these tests, most recently on wevouchfor.org—and they don’t wow me. Sad While I almost always use programmer TDD when coding (and often regret skipping it when I don’t), TDD with these kinds of tests is a chore. It doesn’t feel like enough of the potential value gets realized for the tests to be worth the cost.

  • Writing the executable test doesn’t help clarify or communicate design. Let me be careful here. I’m a big fan of sketching things out on whiteboards or paper:

    A whiteboard

    That does clarify thinking and improve communication. But the subsequent typing of the examples into the computer is work that rarely leads to any more design benefits.

  • Passing tests do continuously show progress to the business, but… Suppose you demonstrate each completed story anyway, at an end-of-iteration demo or (my preference) as soon as it’s finished. Given that, does seeing more tests pass every day really help?

  • Tests do serve as documentation (at least when someone takes the time to surround them with explanatory text, and if the form and content of the test aren’t distorted to cram a new idea into existing test formats).

  • The word I’m hearing is that these tests are finding bugs more often than I expected. I want to dig into that more: if they’re the sort of “I changed this thing over here and broke that supposedly unrelated thing over there” bugs that whole-product regression tests are traditionally supposed to find, that alone may justify the expense of test automation—unless I can find a way to blame it on inadequate unit tests or a need to rejigger the app.

  • (This is the one that made me say “Eureka!”) Tests alone fail at iterative product design in an interesting way. Whenever I’ve made significant progress implementing the next chunk of workflow or other GUI-visible change, I just naturally check what I’ve done through the GUI. Why? This checking makes new bugs (ones the automated tests don’t check for) leap out at me. They also sometimes make me slap my forehead and say, “What I intended here was stupid!”

But if I’m going to be looking at the page for both bugs and to change my intentions, I’m really edging into exploratory testing. Hmm… What if an app did whatever it could to aid exploratory testing? I don’t mean traditional testability features like, say, a scripting interface; I mean a concerted effort to let exploratory testers peek and poke at anything they want within the app. (That may not be different than my old motto “No bug should be hard to find the second time,” but it feels different.)

So, although features of Rails like not having to restart the server after most code changes are nice, I want more. Here’s an example.

The following page contains a bug:

an ordinary web page

Although you can’t see it, the bottom two links are wrong. They are links to /certifications/4 instead of /promised_certifications/4.

  1. Unit tests couldn’t catch that bug. (The two methods that create those types of links are tested and correct; I just used the wrong one.)

  2. One test of the action that created the page could have caught the bug, but did not. (To avoid maintenance problems, that test checked the minimum needed to convince me that the correct “certifications” had been displayed. I assumed that if they were displayed at all, the unit tests meant they were displayed correctly. That was actually almost right—every character outside the link’s href value was correct.)

  3. I missed the bug when I checked the page. (I suspect that I did click one of the links, but didn’t notice it went to the wrong place. If so, I bet I missed the wrongness because I didn’t have enough variety in the test data I set up—ironic, because I’ve been harping on the importance of “irrelevant” variety since 1994.)

  4. A user had no trouble finding the bug when he tried to edit one of his promised certifications and found himself with a form for someone else’s already-accepted certification. (Had he submitted the form, it would have been rejected, but still.)

That’s my bug: a small error in a big pile of HTML the app fired and forgot.
Suppose, though, that the app created and retained an object representing the page. Suppose further that an exploration support app let you switch to another view of that object/page, one that highlights link structure and downplays text:

The same page, highlighting link hrefs

To the eyes of someone who just added promised certifications to that page, the wrong link targets ought to jump out.

There’s more that I’d like, though. The program knows more about those links than it included in the HTTP Response body. Specifically, it knows they link to a certain kind of object: a PromisedCertification. I should be able to get a view of that object (without committing to following the link). I should be able to get it in both HTML form and in some raw format. (And if the link-to-be-displayed were an object in its own right, I would have had a place to put my method, and I wouldn’t have used the wrong one. Testability changes often feed into error prevention.)

And so on… It’s easy enough for me to come up with a list of ways I’d like the app to speak of its internal workings. So what I’m thinking of doing is grabbing some web framework, doing what’s required to make it explorable, using it to build an app, and also building an exploration assistant in RubyCocoa (allowing me to kill another bird with this stone).

To be explicit, here’s my hypothesis:

An application built with programmer TDD, whiteboard-style and example-heavy business-facing design, exploratory testing of its visible workings, and some small set of automated whole-system sanity tests will be cheaper to develop and no worse in quality than one that differs in having minimal exploratory testing, done through the GUI, plus a full set of business-facing TDD tests derived from the example-heavy design.

We shall see, I hope.

Google talk references

One thing I meant to say and forgot: Just as the evolution of amphibians didn’t mean that all the fish disappeared, the creation of a new kind of testing to fit a new niche doesn’t mean existing kinds are now obsolete.

Context-driven testing:

Testing Computer Software, Kaner, Falk, and Nguyen
Lessons Learned in Software Testing, Kaner, Bach, and Pettichord
http://www.context-driven-testing.com
“When Should a Test Be Automated?”, Marick

Exploratory testing:

James Bach
Michael Bolton
Elisabeth Hendrickson
Jonathan Kohl

Left out:

The undescribed fourth age

Embedded vs. independent testers

Bruce Daley posts on how most humans are biased to think they’re less error-prone than they are. As far as I know, that’s a claim solidly based in empirical research. (See also Bruce Schneier’s The Psychology of Security.) From this, he concludes:

Given the nature of their work, software developers and software programmers suffer more from the illusion of knowledge and the illusion of control than most other professions, making them particularly subject to over-looking mistakes in their own code. Which is why software needs to be tested independently.

However. Consider the graph below.

Here, the programmer and independent tester start testing at the same time. (Bad programmer! Bad!) The programmer starts out with more knowledge of the app than the tester (the line marked P/+), but she also has a large amount of cognitive bias (P/-) and lacks testing skill. That makes her miss bugs her knowledge would otherwise allow her to find (the area under the red line). Moveover, her biases seem to be pretty impervious to evidence.

The tester starts out with less knowledge, but has no (relevant) cognitive biases at all. Also, his testing skill lets him ramp up his bug finding pretty fast—but it still takes him a while to overcome her advantage.

Which do you want doing the testing? If you’re shipping at time A, it looks like the programmer has the edge. (Compare the shaded areas under the curve.)

We could expect that advantage to erode over time. If the ship date is farther out, the independent tester would have an advantage, as this graph shows:

Even when all that matters is bug count, the decision is not straightforward, especially since it’s based on information you can’t know until after you’ve decided. (How long will it take the tester to get up to speed? How many and what kind of bugs will the programmer miss?)

On most projects, there are lots of other factors to consider.

So I encourage people not to make the assertion the post’s author does.