Archive for March, 2008

Agile Alliance academic research programme

This program aims to encourage researchers to focus on research questions and issues concerned with agile software development. Researchers are encouraged to apply for small grants to support activities such as conducting a series of visits to practitioner sites, performing interviews, supporting a researcher for a short time to extend existing work into agile development, running workshops, and so on.

More here.

The next deadline for submissions is May 31.

An alternative to business-facing TDD

The value of programmer TDD is well established. It’s natural to extrapolate that practice to business-facing tests, hoping to obtain similar value. We’ve been banging away at that for years, and the results disappoint me. Perhaps it would be better to invest heavily in unprecedented amounts of built-in support for manual exploratory testing.

In 1998, I wrote a paper, “When should a test be automated?“, that sketched some economics behind automation. Crucially, I took the value of a test to be the bugs it found, rather than (as was common at the time) how many times it could be run in the time needed to step through it manually.

My conclusions looked roughly like the following:

test tradeoffs in general

Scripted tests, be they automated or manual, are expensive to create (first column). Manual scripts are cheaper, but they still require someone to write steps down carefully, and they likely require polishing before they can truly be followed by someone else. (Note: height of bars not based on actual data.)

In the second column, I assume that a particular set of steps has roughly the same chance of finding a bug whether executed manually or by a computer, and whether the steps were planned or chosen on the fly. (I say “roughly” because computers don’t get bored and miss bugs, but they also don’t notice bugs they weren’t instructed to find.)

Therefore, if the immediate value of a test is all that matters, exploratory manual testing is the right choice. What about long-term value?

Assume that exploratory tests are never intentionally repeated. Both their long-term cost and value are zero. Both kinds of scripted tests have quite substantial maintenance costs (especially in that era, when testing was typically done through an unmodified GUI). So, to pull ahead of exploratory tests in the long term, scripted tests must have substantial bug-finding power. Many people at that time observed that, in fact, most tests either found a bug the first time they were run or never found a bug at all. You were more likely to fix a test because of an intentional GUI change than to fix the code because the test found a bug.

So the answer to “when should a test be automated?” was “not very often”.

Programmer TDD changes the balance in two ways:

Test tradeoffs for TDD

  1. New sources of value are added. Extremely rapid feedback reduces the cost of debugging. (Most bugs strike while what you did to create them is fresh in your mind.) Many people find the steady pace of TDD allows them to go faster, and that the incremental growth of the code-under-test makes for easier design. And, most importantly as it turns out, the need to make tests run fast and reduce maintenance cost leads to designs with good properties like low coupling and high cohesion. (That is, properties that previously were considered good in the long term—but were routinely violated for short-term gain—now had powerful short-term benefits.)

  2. Good design and better programmer tools dramatically lowered the long-term cost of tests.

So, much to my surprise, the balance tipped in favor of automation—for programmer tests. It’s not surprising that many people, including me, hoped the balance could also tip for business-facing tests. Here are some of the hoped-for benefits:

  • Tests might clarify communication and avoid some cases where the business asks for something, the team thinks they’ve delivered it, and the business says “that’s not what I wanted.”

  • They might sharpen design thinking. The discipline of putting generalizations into concrete examples often does.

  • Programmers have learned that TDD supports iterative design of interfaces and behavior. Since whole products are also made of interfaces and behavior, they might also benefit from designers who react to partially-finished products rather than having to get it right up front.

  • Because businesses have learned to mistrust teams who show no visible progress for eight months (at which point, they ask for a slip), they might like to see evidence of continuous progress in the form of passing tests.

  • People often need documentation. Documentation is often improved by examples. Executable tests are examples. Tests as executable documentation might get two benefits for less than their separate costs.

  • And, oh yeah, tests could find regression bugs.

So a number of people launched off to explore this approach, most notably with Fit. But Fit hasn’t lived up to our hopes, I think. The things that particularly bother me about it are:

  • It works well for business logic that’s naturally tabular. But tables have proven awkward for other kinds of tests.

  • In part, the awkwardness is because there are no decent HTML table editors. That inhibits experimentation: if you don’t get a table format right the first time, you’re tempted to just leave it.

    Note: I haven’t tried ZiBreve. By now, I should have. I do include Word, Excel, and their OpenOffice equivalents among the ranks of the not-decent, at least if you want executable documentation. (I’ve never tried treating .doc files as the real tests that are “compiled” into HTML before they’re executed.)

  • Fit is not integrated into programmer editors the way xUnit is. For example, you can’t jump from a column name to the Java method that defines it. Partly for this reason, programmers tend to get impatient with people who invent new table formats—can’t they just get along with the old one?

With my graphical tests, I took aim at those sources of friction. If I have a workflow test, I can express it as boxes and arrows:

a workflow test

I translate the graphical documents into ordinary xUnit tests so that I can use my familiar tools while coding. The graphical editor is pretty decent, so I can readily change tests when I get better ideas. (There are occasional quirks where test content has changed more than it looks like it has. That aspect of using Fit hasn’t gone away entirely.)

I’ve been using these tests, most recently on wevouchfor.org—and they don’t wow me. Sad While I almost always use programmer TDD when coding (and often regret skipping it when I don’t), TDD with these kinds of tests is a chore. It doesn’t feel like enough of the potential value gets realized for the tests to be worth the cost.

  • Writing the executable test doesn’t help clarify or communicate design. Let me be careful here. I’m a big fan of sketching things out on whiteboards or paper:

    A whiteboard

    That does clarify thinking and improve communication. But the subsequent typing of the examples into the computer is work that rarely leads to any more design benefits.

  • Passing tests do continuously show progress to the business, but… Suppose you demonstrate each completed story anyway, at an end-of-iteration demo or (my preference) as soon as it’s finished. Given that, does seeing more tests pass every day really help?

  • Tests do serve as documentation (at least when someone takes the time to surround them with explanatory text, and if the form and content of the test aren’t distorted to cram a new idea into existing test formats).

  • The word I’m hearing is that these tests are finding bugs more often than I expected. I want to dig into that more: if they’re the sort of “I changed this thing over here and broke that supposedly unrelated thing over there” bugs that whole-product regression tests are traditionally supposed to find, that alone may justify the expense of test automation—unless I can find a way to blame it on inadequate unit tests or a need to rejigger the app.

  • (This is the one that made me say “Eureka!”) Tests alone fail at iterative product design in an interesting way. Whenever I’ve made significant progress implementing the next chunk of workflow or other GUI-visible change, I just naturally check what I’ve done through the GUI. Why? This checking makes new bugs (ones the automated tests don’t check for) leap out at me. They also sometimes make me slap my forehead and say, “What I intended here was stupid!”

But if I’m going to be looking at the page for both bugs and to change my intentions, I’m really edging into exploratory testing. Hmm… What if an app did whatever it could to aid exploratory testing? I don’t mean traditional testability features like, say, a scripting interface; I mean a concerted effort to let exploratory testers peek and poke at anything they want within the app. (That may not be different than my old motto “No bug should be hard to find the second time,” but it feels different.)

So, although features of Rails like not having to restart the server after most code changes are nice, I want more. Here’s an example.

The following page contains a bug:

an ordinary web page

Although you can’t see it, the bottom two links are wrong. They are links to /certifications/4 instead of /promised_certifications/4.

  1. Unit tests couldn’t catch that bug. (The two methods that create those types of links are tested and correct; I just used the wrong one.)

  2. One test of the action that created the page could have caught the bug, but did not. (To avoid maintenance problems, that test checked the minimum needed to convince me that the correct “certifications” had been displayed. I assumed that if they were displayed at all, the unit tests meant they were displayed correctly. That was actually almost right—every character outside the link’s href value was correct.)

  3. I missed the bug when I checked the page. (I suspect that I did click one of the links, but didn’t notice it went to the wrong place. If so, I bet I missed the wrongness because I didn’t have enough variety in the test data I set up—ironic, because I’ve been harping on the importance of “irrelevant” variety since 1994.)

  4. A user had no trouble finding the bug when he tried to edit one of his promised certifications and found himself with a form for someone else’s already-accepted certification. (Had he submitted the form, it would have been rejected, but still.)

That’s my bug: a small error in a big pile of HTML the app fired and forgot.
Suppose, though, that the app created and retained an object representing the page. Suppose further that an exploration support app let you switch to another view of that object/page, one that highlights link structure and downplays text:

The same page, highlighting link hrefs

To the eyes of someone who just added promised certifications to that page, the wrong link targets ought to jump out.

There’s more that I’d like, though. The program knows more about those links than it included in the HTTP Response body. Specifically, it knows they link to a certain kind of object: a PromisedCertification. I should be able to get a view of that object (without committing to following the link). I should be able to get it in both HTML form and in some raw format. (And if the link-to-be-displayed were an object in its own right, I would have had a place to put my method, and I wouldn’t have used the wrong one. Testability changes often feed into error prevention.)

And so on… It’s easy enough for me to come up with a list of ways I’d like the app to speak of its internal workings. So what I’m thinking of doing is grabbing some web framework, doing what’s required to make it explorable, using it to build an app, and also building an exploration assistant in RubyCocoa (allowing me to kill another bird with this stone).

To be explicit, here’s my hypothesis:

An application built with programmer TDD, whiteboard-style and example-heavy business-facing design, exploratory testing of its visible workings, and some small set of automated whole-system sanity tests will be cheaper to develop and no worse in quality than one that differs in having minimal exploratory testing, done through the GUI, plus a full set of business-facing TDD tests derived from the example-heavy design.

We shall see, I hope.

Google talk references

One thing I meant to say and forgot: Just as the evolution of amphibians didn’t mean that all the fish disappeared, the creation of a new kind of testing to fit a new niche doesn’t mean existing kinds are now obsolete.

Context-driven testing:

Testing Computer Software, Kaner, Falk, and Nguyen
Lessons Learned in Software Testing, Kaner, Bach, and Pettichord
http://www.context-driven-testing.com
“When Should a Test Be Automated?”, Marick

Exploratory testing:

James Bach
Michael Bolton
Elisabeth Hendrickson
Jonathan Kohl

Left out:

The undescribed fourth age

Comments

My spam filter seems to be classifying everything as spam. I’ve unspammed the real comments that hadn’t been auto-deleted yet. If you were wondering why your comment never got posted, that’s why. Maybe I should go back to requiring registration.

Drive out waste

For service, give me a rude but efficient New Englander over a friendly but slow Southerner any day. I’ve been made fun of for shutting the dishwasher door with my foot while simultaneously stretching the other way to grab something out of a cupboard, but it just makes sense to me to do things in parallel unless they have to be serial. I fume when behind people who wait until the cashier tells them the total before beginning to fumble for their money. So I ought to be all for one of the defining characteristics of Lean: driving out waste.

And I am, but… it’s a dangerous tool when used as an excuse by the inhumane. For an illustration, go to my favorite passage from my favorite Shakespeare play, King Lear. Lear has given over his power to two of his daughters, Goneril and Regan. He and a hundred rowdy knights are staying with Goneril, who wants him to dismiss half of them. He pitches a fit:

                                              … thou are a boil,
A plague-sore, an embossed carbuncle,
In my corrupted blood. But I’ll not chide thee …
I can be patient; I can stay with Regan,
I and my hundred knights.

But Regan agrees with her sister:

                                        … what, fifty followers?
Is it not well? What should you need of more?

… and then goes a step further:

                                        … I entreat you
To bring but five and twenty: to no more
Will I give place or notice.

Lear is shocked, repudiates her, and decides to stay with Goneril, saying to her:

                                        … I’ll go with thee:
Thy fifty yet doth double five and twenty,
And thou art twice her love.

But Goneril is remorseless:

                                        … Hear me, my lord;
What need you five and twenty, ten, or five,
To follow in a house where twice so many
Have a command to tend you?

And, in what I consider one of the most devastating short lines ever written, Regan adds:

What need one?

Then comes Lear’s great ineffectual cry and descent into madness:

O reason not the need! Our basest beggars
Are in the poorest thing superfluous.
Allow not nature more than nature needs,
Man’s life is as cheap as beast’s. Thou art a lady:
If only to go warm were gorgeous,
Why, nature needs not what thou gorgeous wear’st,
Which scarcely keeps thee warm. […]

                                    … touch me with noble anger,
And let not women’s weapons, water drops,
Stain my man’s cheeks. No, you unnatural hags!
I will have such revenges on you both
That all the world shall–I will do such things–
What they are, yet I know not; but they shall be
The terrors of the earth. You think I’ll weep.
No, I’ll not weep.

Regan and Goneril were driving out waste. Those knights really were a rowdy, drunken gang of good-for-nothings. But waste was just an excuse. R&G really cared about personal power, not waste. And so will many people marching behind the Lean “banner with a strange device: muda!”

Now, as Jonathan Kohl would point out, many people marching behind the Agile banner do the same: they use Agile as another club with which to beat people. I’m less worried about Agile, though, because its base rhetoric is more explicitly humanist. Lean is more likely to be an attractive nuisance because the idea of driving out waste appeals to executives who find it less work to remove waste than to convert it into value—executives who get license to act sociopathic because they have a fiduciary duty to treat business as a machine for maximizing shareholder value, externalities be damned. I worry about Lean in a business culture where we are trained out of empathy for Lear, damned fool though he surely is.

Four Ages of Testing

Blurb for my Google Testapalooza keynote:

Four Ages of Testing

Just as biological species do, testing approaches change to fill new ecological niches. This talk covers four broad approaches to testing. It will spend most of its time on the third, an unfinished punctured equilibrium where testing is struggling to balance its traditional role — dispassionate judge of an end result — with a new demand for active help during design. It will also hint at a niche just opening up, one where technology allows testing to become a much more direct conduit for the will of the users.

Brian Marick (marick@exampler.com, www.exampler.com/blog, twitter.com/marick) was a programmer, tester, and team lead in the 80’s, a testing consultant in the 90’s, and is an Agile consultant this decade. He was one of the authors of the Manifesto for Agile Software Development and is the author of two books (The Craft of Software Testing and Everyday Scripting with Ruby) and a bunch of articles. He turned down a pre-IPO job offer from Google, in part because he expected the craze for its stock to have ended by the time his options vested, which ought to make you wonder about his insight.

Embedded vs. independent testers

Bruce Daley posts on how most humans are biased to think they’re less error-prone than they are. As far as I know, that’s a claim solidly based in empirical research. (See also Bruce Schneier’s The Psychology of Security.) From this, he concludes:

Given the nature of their work, software developers and software programmers suffer more from the illusion of knowledge and the illusion of control than most other professions, making them particularly subject to over-looking mistakes in their own code. Which is why software needs to be tested independently.

However. Consider the graph below.

Here, the programmer and independent tester start testing at the same time. (Bad programmer! Bad!) The programmer starts out with more knowledge of the app than the tester (the line marked P/+), but she also has a large amount of cognitive bias (P/-) and lacks testing skill. That makes her miss bugs her knowledge would otherwise allow her to find (the area under the red line). Moveover, her biases seem to be pretty impervious to evidence.

The tester starts out with less knowledge, but has no (relevant) cognitive biases at all. Also, his testing skill lets him ramp up his bug finding pretty fast—but it still takes him a while to overcome her advantage.

Which do you want doing the testing? If you’re shipping at time A, it looks like the programmer has the edge. (Compare the shaded areas under the curve.)

We could expect that advantage to erode over time. If the ship date is farther out, the independent tester would have an advantage, as this graph shows:

Even when all that matters is bug count, the decision is not straightforward, especially since it’s based on information you can’t know until after you’ve decided. (How long will it take the tester to get up to speed? How many and what kind of bugs will the programmer miss?)

On most projects, there are lots of other factors to consider.

So I encourage people not to make the assertion the post’s author does.

Pithysoft

UPDATE: Turns out that what I want to do, modeled after something used for RubyConf, can’t be done in stock Twitter. Seeing if I can persuade the Twitter people to work the same magic for me.

Item: Richard P. Gabriel has this habit of making software people write or speak within artificial constraints.

  • For writers’ workshops (book-length PDF), he’s made reviewers write a summary exactly 29 words long.

  • In last OOPSLA’s “50 in 50” keynote, he and Guy Steele, Jr., covered the last five decades of programming languages in 50 segments, each exactly 50 words long (in a talk lasting, I believe, about 50 minutes).

The point of constraints is that they make you work: you can’t use the words that first come to mind. You have to struggle to say what you want while playing by the rules them—and sometimes that makes you realize you ought to be wanting to say something else. Constraints are a tool to make you think new thoughts.

Item: I’ve become strangely fond of Twitter. It’s a service that lets you send short (140 character) “tweets” out into the ether. Other people can subscribe to (”follow”) your tweets. They can see the tweets of everyone they follow by visiting their own twitter web page (here’s mine), subscribing to an RSS feed, or using a twitter-specific app to fetch tweets. (I use Twitterrific.)

That’s form: what about content? As Twitter user shalunov (Stanislav Shalunov) puts it (in a tweet):

Four main ways to tweet: ideas, news, @-chat, phatic coffee. The last is the original, rest invented by users.

Ideas are the tweets I’m most interested in. Slalunov’s is an example of an idea tweet.

News is my second interest. As a geographically isolated person, it’s one way of knowing what interesting people are chattering about.

“@-chat” is a sort of person-to-person instant messaging. For example, cypher23 wrote “Stalker is a weird and wonderful film.” I replied: @cypher23 Harrison’s new _Nova Swing_ is in the sub-sub-genre with Stalker, _Rogue Moon_, and _Roadside Picnic_. Liking it so far.” Anyone following cypher23 would see both his tweet and my reply. Someone following only me would see only my reply (but could click on the hyperlinked cypher23 to see all his recent tweets). Because of the one-sidedness, and because the topics tend to be less interesting than those in the first two categories, I tend not to follow people who have a high proportion of @-chat in their tweets.

Phatic coffee” is just tweeting what you’re doing now, like avibryant’s recent “obsessively refreshing UPS tracking page for new laptop” Although I’m somewhat of a hermit and not much for social chit-chat, I’m not immune to phaticality. (I find chadfowler’s heavily phatic tweets appealing, oddly puckish, and somehow soothing.) But I likely won’t follow someone who’s predominantly phatic.

Item: While writing a book, I often find myself disinclined to spend spare time writing blog posts. Yet I continue to have ideas. I’m sure lots of other people do too.

Synthesis: I’ve created a fake twitter user named pithysoft. It’s for anyone’s pithy tweets about software development. When I finish this post, I’ll send the first one: “d pithysoft Business-facing tests are like personal ads: No matter how exact your description, the reality always tells you something new.” People following pithysoft will see it. If the pithy claim intrigues them, they can tweet pithysoft with something like @marick More about tests and personal ads, plz”. That would encourage me to write it up on my blog. When I did that, I could tweet @pithysoft Expanded on XYX here: XYX”

An experiment. Let’s see how it goes.

Bleg: television series

Since Dawn and I are effete, latte-swilling, Obama-supporting liberals,* we don’t have a television. We do, however, watch television series on a laptop.** We’re running out and need suggestions.

Dawn and I mix up series like The Wire and Deadwood with guilty pleasures like Veronica Mars and the first few seasons of 24. We’re starting on The Corner. Sopranos didn’t grab us. From that list, it looks like character-driven dramas with season-long story arcs are good. Depressing is certainly OK.

We also started watching Joan of Arcadia and Dead Like Me, both of which later migrated to whole-family viewing.

With the kids, we’ve watched Buffy, Angel, Dark Angel, Tru Calling, and Lost.

Sophie and I have watched Battlestar Galactica, but Dawn and Paul are not wild about outer-space SF.

Suggestions?

* Effete, latte-swilling liberals, but also salt of the earth Midwesterners who have both*** delivered calves by hauling on chains.

** Think of a depression-era family huddled around a radio. Salt of the earth, like I said.

*** Well, I’ve only done it once. Not so fun I’d make it a habit.

What is Agile?—beats me, but I know it when I see it

Cory Foy has started an Agile FAQ. His first question is What is Agile? Now, I’m notorious for wandering away from definitional arguments, and I like the answers Cory already has, but I think I have something to add. I have an incomplete and informal list of questions I ask myself about teams to gauge whether they really “get it”:

  • Is there spontaneous chatter? (Most often work-related, but a helping of casual chatter too.)

  • Is there hustle?

  • Are people afraid of being wrong?

  • Do people readily ask for help? Do people readily volunteer help? Even—especially—when they could say, “that’s not my job”?

  • Do I notice people giving in for the sake of the group? (Such as deciding to try something someone else’s way as a way to reduce tension.)

  • When people talk about solving problems, do they talk in terms of nudging something that’s wrong in the direction of rightness, or in terms of solving the problem once and for all?

  • Is their response to a problem to increase the visibility of information? Do they seem to think that if people know about a problem, and are continuously reminded of it, that they’re likely to just naturally act to solve it?

  • Are they a touch monomaniacal about getting working software out there, or at least being able to show someone something new that actually works?

  • Do they disparage the “business side of the house”, or do they have active sympathy with the people there?

  • Do they act helpless? Or as if they have power? Do they give up on problems because “they” will never let them be fixed? (”They” being management, the cubicle police, the configuration management board, etc.)

  • Do they want to be able to take pride in their work? (Or are they cynical or passive about whatever it is they do?) And do they take pride?

I don’t care if I know what Agile is if I know it when I see it. I don’t know to what extent certain values, practices, techniques, or tools influence my answer to the question “Is this team Agile?” To some extent, for sure.