Exploration Through Example

Example-driven development, Agile testing, context-driven testing, Agile programming, Ruby, and other things of interest to Brian Marick
191.8 167.2 186.2 183.6 184.0 183.2 184.6

Fri, 30 Dec 2005

Working your way out of the automated GUI testing tarpit (part 5)

part 1, part 2, part 3, part 4

In the last installment, I made an automated GUI test faster—but it still takes three seconds to run. In this installment, I'll increase it to unit-test speeds. In fact, I'll argue that it really is a unit test. The next-to-last step out of the GUI testing tarpit is to convert existing GUI tests into unit tests of rendering and of the business logic behind what's rendered. (The final step is to create workflow tests that really do have something to do with the GUI.)

The existing tests call enter and press methods on a Browser object (after going through differing types of indirection). That Browser object turns presses into HTTP requests. They're sent to localhost:8080 and received by a Server that's a separate process. The server picks apart the HTTP Request and sends commands like login and new case to an App. The App manipulates a Model, then returns the name of the next page to display. The server renders that page and sends it back to the browser.

We can speed up the declarative test by cutting out the network. NullBrowser has the same interface as Browser, but it calls the App directly. The test now runs in around 0.5 seconds. Almost all of that time is spent in XML parsing and XPATH searching. I wish the test were faster, but not enough just now to find a different XML parser.

(You can skip this section unless you care about what power the rewritten test loses.)

Have I weakened the test? This sequence of the Server's code (spread among several methods) is now unexercised:

      @dispatched << [command, args]
      @current_page_name = @app.send(command, args)
      @current_xhtml = @renderer.send("#{@current_page_name}_for", @app)
      response.body = @current_xhtml
      raise HTTPStatus::OK

But how many tests do I need to be confident this code works? And does this test need to be one of them? I think not, so we can live with this weakening, but I'll make a note to later ensure that some test checks the sequence.

Some server setup now also goes untested. It looks like this:

  def install_UI_servlets
    install_generic_proc('/') { | request, app |
      render(:beginning_page)
    }
    install_command(:login, 'login', 'password')
    install_command(:new_case)
    install_command(:record_case, 'client', 'clinic_id')
    install_command(:add_visit)
    install_command(:add_audit)
    install_command(:record_visit, 'diagnosis', 'charges')
    install_command(:record_audit, 'auditor', 'variance')
  end

If, for example, record_audit were misspelled in the next-to-last line, our changed test would no longer detect that. So we need at least one test that exercises each application command through HTTP. It could be a separate test for each command, or one test for all the commands together, or anything in between—but this test no longer has anything to do with that. I'll defer the issue of those tests until what I think will be part 7. (Note that exercising each command will check the dispatching and rendering code shown three paragraphs ago, so I can erase my earlier reminder.)

The real HTTP server renders a page for each command, so the earlier version of this test did as well. The new version only renders the one page it cares about. So certain bugs in rendering might not be caught by this test. (They'd have to be very unsubtle bugs, since even the earlier version never actually checked any of the HTML along the way to the page-under-test. Only something like a thrown exception would be noticed.) Still, we need at least one test that checks each rendered page. I'll keep that in mind as I continue.

I next did a little cleanup, removing the fake browser object from the execution path since it really adds no value. I'll skip the details. Suffice it to say that the effort surfaced some duplication hidden behind this surface:

  def test_cannot_append_to_a_nominal_audit
    as_our_story_begins {
      we_have_an_audit_record_with(:variance => 'nominal')
      we_are_at(:case_display_page)
    }
  
    assert_page_title_matches(/Case \d+/)
    assert_page_has_no_action(:add_audit)
  end

The duplication made me wonder: what's this test really about? Does it have anything at all to do with movement through pages? No, it's about the rendering of pages in the presence of model state that ought to affect what gets rendered. These kind of tests are better described like this:

Given an app with particular state,
when rendering a particular page:
    I can make certain assertions about that page.

Or, in code:

  def test_nominal_audit_prevents_the_add_audit_action
    given_app_with {
      audit_record('variance' => 'nominal')
    }
    when_rendering(:case_display_page) {
      assert_page_has_no_action(:add_audit)
    }
  end

This is a business-facing test in that it describes a business rule: if you've got one nominal audit, there should be no way to add any more audits. It's also like a unit test in that it gives very specific instructions to a programmer. In my case, the fact that this test fails instructs me to change a particular localized piece of code:

  def case_display_page(app)
    ...
         p(command_form('add_audit',
                        submit('Add an Audit Record'))))
  end

(I'll talk about my rendering peculiarities in some later installment.)

A lot of Fit tests share this property of being about localized business rules (or business rules that should be localized). It seems to be a distinct category of business-facing test, one that often gets overlooked because of the assumption that a customer/acceptance/functional test must be end-to-end and must go through the same interface as the user does.

My test here should be one of a file-full of tests that describe what's most important—from a business point of view—about the presentation of a particular place (or interaction context) in the application. Another test of that sort would be this one:

  def test_typical_case_display_page
    given_app_with {
      case_record('clinic_id' => 19600219)
    }
    when_rendering(:case_display_page) {
      assert_page_title_matches(/^Case 19600219/)
      assert_page_has_action(:add_visit)
      assert_page_has_action(:add_audit)
    }
  end

This test describes three facts about the Case Display page's default appearance that must survive any fiddling with how it looks: it must have a title that includes the case's clinic ID, and there must be a way to cause the add-visit and add-audit actions in the App. (This test passes, by the way, though the previous one continues to fail.)

Consider this test something like a wireframe diagram in code.

Most tarpit GUI tests are addressing, explicitly and implicitly, several issues all jumbled together. If you separate them, you get something that's both faster and much more clear. Here, I've addressed the particular issue of what must be true of a page. Later, I'll address the particular issue of what must be true of navigation among pages. But first, I'll make my test pass and see what that suggests about hooking business rules into rendering.

See the code for complete details.

## Posted at 20:30 in category /testing [permalink] [top]

Tue, 20 Dec 2005

Two Agile Alliance programs you may be interested in

  1. The Academic Research program "aims to encourage researchers to focus on research questions and issues concerned with agile software development. Researchers are encouraged to apply for small grants to support activities such as conducting a series of visits to practitioner sites, performing interviews, supporting a researcher for a short time to extend existing work into agile development, running workshops, and so on." I'm on the approval committee. We've approved two proposals so far.

  2. The "Agile Times" newsletter is being reborn. The first of the new issues will be out in March 2006. Remaining issues will come out quarterly. The editor is Rebecca Traeger, formerly editor of Better Software. Both Mike Cohn and I have worked with her. She's good. She's also being paid, so the deadline is more firm than it can possibly be with volunteer work.

    The newletter is currently looking for agile how-to articles, case studies, opinion pieces, book reviews, conference reports, and emerging trends pieces. Articles can cover soft skills (selling, cooperation, interactions) or hard skills (coding, testing, etc.). Deadline for the first issue: January 23.

    Also contact Rebecca if you want to advertise.

## Posted at 11:11 in category /misc [permalink] [top]

Mon, 19 Dec 2005

Working your way out of the automated GUI testing tarpit (part 4)

part 1, part 2, part 3

The story so far: One of my main goals for tests is that they contain no excess words. That means that a GUI test should not describe the path by which it gets to the page under test. In part 1, I described a declarative format. With it, the test writer specifies all and only the facts that should be true of the app at the point the test begins. Part 2 gives a simple implementation that figures out a path through the app that makes those facts true. Part 3 recommends that you migrate tests to this format only as they fail.

The new format tests, though, run as slowly as they did before being migrated. Now it's time to make them faster. I'll do that in two steps. The first doesn't even double their speed. That's hardly sufficient, but the implementation has a side effect that helps the programmer and exploratory tester.


Previously, I only pretended the app talked across the network. Since that fakery would make any timings useless, it's now running on a real server (WEBrick), fielding real live HTTP. So localhost:8080 shows this stunningly attractive UI:

Welcome to the Case Management System

Authorized Users Only

Login:
Password:

In part 1, I wrote three versions of a test. All three of them communicate with the server in exactly the same way: they send eight different HTTP GET commands (just as a browser would if you visited the app and then pressed seven buttons on seven pages).

To speed up the test, I've made it remember all eight of the commands the first time it runs. (That all happens behind the scenes; there are no changes to the test.) Now later runs can send the commands in a big glob via a side channel. That avoids seven of the round trips.

The results are underwhelming. The original test takes 5.1 seconds. The version that sends the big glob takes 3.2 seconds. The more complicated the test setup path, the greater the speedup would be, but still—is this worth the trouble?

Not so far, but it will be after the next speedup. I hope. In the meantime, there's a useful spinoff feature. One of the reasons I hate anything to do with improving a UI is that every time you tweak a page, you have to navigate to it to check whether the change looks right. Having to do that four or five times in a row drives me wild. So I wish this were a universal law:

You can get to any page in an app in one step.

Now that we can remember application state, that's possible here. Imagine the following:

You have to tweak a particular page in the UI. You navigate to that page, then type this:

  ruby hyperjump.rb --snapshot myfix

You go into the code, make a change, reload the app, and return the app to its previous state like this:

  ruby hyperjump.rb myfix --open

The --open tells hyperjump to open localhost:8080/refresh in the browser. That shows the page corresponding to the saved state, which is the page you're tweaking.

This jump-to-page feature would also be useful for exploratory testing. It's common to go to the same place in the program multiple times during a bout of exploratory testing. Perhaps you're trying to learn more about the circumstances in which a bug occurs (a kind of failure improvement). Or you're trying different paths through the program, each of which starts some distance into it.


There's nothing new about using captured commands to accelerate tasks. People have been using GUI capture/replay tools for this kind of thing since the dawn of time. But it's nice that the feature fell out of a different goal.

For more about the implementation, refer to part 4b. The code has the complete details.

## Posted at 22:49 in category /testing [permalink] [top]

Working your way out of the automated GUI testing tarpit (part 4b)

Here are some (decidedly optional) details about the implementation described in part 4.

Consider this test:

  def test_cannot_append_to_a_nominal_audit
    as_our_story_begins {
       we_have_an_audit_record_with(:variance => 'nominal')
       we_are_at(:case_display_page)
    }

    assert_page_has_no_button_labeled('Add Audit')
  end

as_our_story_begins sets up the application state by deducing a sequence of commands to send to the browser. After that's done the first time, the sequence is stored in a file devoted to a single test method. The one for the test we've been using is path-cache/declarative-test.rb/test_cannot_append_to_a_nominal_audit. Its contents look like this:

[[:login, ["unimportant", "unimportant"]],
  [:new_case, []],
  [:record_case, ["unimportant", "213"]],
  [:add_visit, []],
  [:record_visit, ["unimportant", "100"]],
  [:add_audit, []],
  [:record_audit, ["unimportant", "nominal"]]]

The next time declarative-test.rb is run, as_our_story_begins notices there's a cache file, and sends its contents over an XMLRPC connection. The server turns it into an array:

  command_descriptions = eval(command_string)

Then each command is dispatched to the App object:

  command_descriptions.each { | one |
      @current_page_name = dispatch(*one)
  }


  def dispatch(command, args)
    @dispatched << [command, args]
    @app.send(command, *args)
   end

That dispatch method is exactly the same method used to react to requests from the browser:

    @current_page_name = dispatch(command,
                                                          values(request, *required_args))
    render(@current_page_name)

By doing that, I reduce the suspicion that the restored state is somehow different than the one that app had at the moment the snapshot was taken.

The only difference between the two different routes into the app is what happens after dispatching. dispatch returns the name of the next page to send to a browser. When the request comes from a browser, the page is rendered and sent back. When it comes by the XMLRPC side channel, nothing is done, but the most recent page name is stashed away. When the browser visits localhost:8080/refresh, the name is used to render the page:

  install_generic_proc('/refresh') { | request, app |
      render(@current_page_name)
  }

Notes:

  • There's a way in which all of my tests could be broken (even before caching). My test doesn't drive a real browser. Instead, I use a Browser object that sends GET requests directly to the server. As a result, nothing in the test will fail if the wrong pages are rendered. The tests would appear to work perfectly fine if every GET request returned a blank page instead of any of the correct forms.

    That wouldn't be a problem in real life. In real life, my tests would be issuing commands to a browser via Watir or Selenium. I should probably use one of them for this demo but (1) Watir only works with Windows IE and I use a Mac, and (2) I'm too lazy to learn Selenium right now.

  • The list of commands is stored in the app, not in the test. When it's time to cache the state, the test asks the app for the list. No thought went into the decision to do it that way. Maybe some should have.

  • Previously, the tests bogusly succeeded. They fail now, so that I can later write the code to make them pass.

  • The tests launch the server in a subprocess (see test-util.rb). They use fork(), kill(), and wait(). I don't know if those work on Windows.

Credit: The idea of replaying server-level commands just popped into my head. It might have been put there by Michael Silverstein's "Logical Capture/Replay".

## Posted at 22:49 in category /testing [permalink] [top]

Thu, 15 Dec 2005

Bugs matter

A working undo command can sure be handy:

After initially denying any responsibility for the J-Com snafu, exchange executives acknowledged this week that flaws in their electronic trading system prevented Mizuho from correcting its order and minimizing losses. Mizuho traders realized their mistake within 85 seconds of placing the erroneous order and made four attempts to cancel it. It was rejected each time. [...]

On Monday, Japan's market regulator estimated Mizuho's loss at $331 million.

Evidence continues to mount that no honorable election official should support voting machines without a paper trail:

[...] in three separate attempts over a four month period, computer experts Dr. Herbert Thompson and Harri Hursti visited the Leon County Elections Office in their efforts to penetrate the county voting tabulation equipment and alter election data. [...]

Granted the same access as an employee of our office, it was possible to enter the computer, alter election results, and exit the system without leaving any physical record of this action. [...]

Based upon the data developed out of this exercise it is the opinion of the Leon County Supervisor of Elections that any effort to limit or remove the manual examination of paper ballots to confirm the correctness of election results is not in the public interest.

## Posted at 07:35 in category /misc [permalink] [top]

You know you overuse particular examples on your site when...

... you receive mail like this:

I'm impressed with your professional website related to goat and cow milking devices. My website is a NON-COMPETING informational site offering collections of articles, diagrams, and other publications that relate to and may help supplement your website. Since some of your site visitors may be interested in learning more about animal milking devices, feel free to link to my site. By linking to my site, you will be adding value to your own site by providing relevant content for your site visitors. Following is the link I think will be the most helpful for your customers: http://www.braindex.com/products/175+-ANIMAL-(GOAT-&-COW)-MILKING-DEVICE-RELATED-PATENTS-ON-CD-29.htm. Please visit this link yourself and consider inserting it on your website.

(I don't think this is phishing or some variant, which would be less interesting than having a search bot—albeit one with wide tolerances and low cleverness—think my site is actually about cows.)

## Posted at 07:35 in category /junk [permalink] [top]

Tue, 13 Dec 2005

Convention over configuration workbook?

Item: I'm fond of Bill Wake's Refactoring Workbook because it shows lots of examples of refactorings in action.

Item: Rails is hot, hot, hot these days.

Item: One reason it's popular is convention over configuration, which...

... places ease of use for the majority of situations ahead of the need to provide maximum flexibility for the few. The way this is done is through the adoption of coding conventions that automatically embed a certain amount of configuration right into the framework. Convention makes certain assumptions about how things will be put together and by making these assumptions implicit in the code it frees the framework from the burden of having to spell out every intention through explicit configuration. The conventions can be overridden to handle cases where the convention might not be optimal but speed and ease of use are the big benefit that comes from adopting them.

Item: Berin Loritsch says:

Java applications can be developed using [convention over configuration], but often aren't. The problems come into play when the framework you are using works against you. Other times its just too difficult to do right. You will have to resort to reflection and other black magic tricks.

Item: Better Software has had an author drop out. Three times before when that's happened, I've quickly written a replacement article. Two of them have worked out rather well, I think. (You can see them on the sidebar: "Behind the Screens" and "Bypassing the GUI".)

Therefore, I'm thinking of writing an article on convention over configuration in Java-style languages. (Despite the chain of thought implied here, the idea was really Mike Cohn's.) The problem is, I don't have any personal experience to draw on. Do you have examples that would let me produce an article with something of the flavor or Wake's book? If so, you know how to reach me.

## Posted at 07:19 in category /misc [permalink] [top]

Mon, 12 Dec 2005

Working your way out of the automated GUI testing tarpit (part 3)

part 1, part 2

In the real world, you can't leap out of a tarpit in one bound. The same is true of a metaphorical tarpit. Here's a scenario to avoid:

  • You have 2500 tests. At any given moment, some 200 of them are failing. Most of the failures are because of irrelevant interface changes, not because the code has a bug. As a result, hardly anyone looks at the failing tests.

  • Someone invents a much more compact, much more maintainable way of writing tests.

  • Someone (likely that same person) is assigned the task of rewriting all the tests in the new form.

  • She gets through about 300 before something urgent comes up. Rewriting the tests becomes a background task. So tedious was it that somehow it never makes it back to the foreground.

  • A year later, you have 2500 tests. 336 of them are rewritten (perhaps not the most important ones—no one knows which of the old suite are the important ones). At any given moment, those 336 are trustworthy, but 173 of of the unconverted tests are failing for the same old reason. No one looks at those tests.

Even if the task is plowed through to the end, it has not changed the habits of the team, so there's no counterforce to whatever forces caused the problem in the first place. I'm with William James on the importance of habit:

only when habits of order are formed can we advance to really interesting fields of action [...] consequently accumulate grain on grain of willful choice like a very miser; never forgetting how one link dropped undoes an indefinite number.

Therefore, my bias is toward having everyone convert the test suite one failure at a time:

  • As part of every story, spend around 20 minutes fixing failing tests in the untrustworthy suite. You're probably better off just fixing the next failing one than trying to find which one is most worth fixing.

    If a test has found a legitimate bug, either fix that bug immediately (if the fix doesn't take long) or put it on the backlog to be scheduled as a story.

  • Fixed tests get moved over to a reliable suite. That suite is run as part of the continuous integration build. No story is done if any of those tests fail. (I would not include tests for backlog bugs in this suite.)

  • This process continues ad infinitum. You may never eliminate the untrustworthy suite. If some test there never fails, it will never get converted.

Some fraction — perhaps a large fraction — of the old tests are likely to be worthless. (More precisely, they're worth less than the cost of reviving them.) It's hard to persuade people to throw away tests, but nonetheless I'd try. (There are unknown risks to throwing tests away. My bias would be to do it and let the reality of escaped bugs make the risks better known. Tests can always be un-thrown away by retrieving them from Subversion.)

A tempting alternative is simply to delete the old test suite and start over. Spend the 20 minutes writing a new test instead of reviving a failed one. That might well be time better spent. But it's a tough sell because of the sunk cost fallacy.

## Posted at 13:06 in category /testing [permalink] [top]

UI design links from Jared M. Spool

What makes a design intuitive? is nice and readable short article about the two ways to make an interface that people will call intuitive.

Designing embraceable change is a follow-on that talks about how to introduce a new UI to an existing community. This has relevance to Agile projects that are continually tinkering with the UI.

The series ends with The quiet death of the major relaunch. Here's a trivial example of the approach:

At eBay, they learned the hard way that their users don't like dramatic change. One day, the folks at eBay decided they no longer liked the bright yellow background on many of their pages, so they just changed it to a white background. Instantly, they started receiving emails from customers, bemoaning the change. So many people complained, that they felt forced to change it back.

Not content with the initial defeat, the team tried a different strategy. Over the period of several months, they modified the background color one shade of yellow at a time, until, finally, all the yellow was gone, leaving only white. Predictably, hardly a single user noticed this time.

The key point in this last article is this:

Our findings show that consistency in the design plays second fiddle to completing the task. When users are complaining about the consistency of a site, we've found that it is often because they are having trouble completing their tasks.

## Posted at 10:22 in category /links [permalink] [top]

Agile consultants

In my role as the overcommitted and underskilled Agile Alliance webmaster, I add new corporate members to the site. I realized today that we really have quite an impressive variety there. You can find companies in out-of-the-way places (Topeka, Kansas, USA). It's less easy to find companies that have particular skills, since the blurbs don't generally focus on a company's specific competitive advantage. Nevertheless, I recommend it to you if you're looking for a consultancy.

P.S. Not me, though. Exampler Consulting isn't a corporate member because I've never gotten around to getting a logo.

P.P.S. Corporate membership was Rebecca Wirf-Brock's idea.

## Posted at 10:22 in category /agile [permalink] [top]

Sun, 11 Dec 2005

Working your way out of the automated GUI testing tarpit (part 2)

In the previous installment, I described a test that looked like this:

  def test_cannot_append_to_a_nominal_audit
    @browser.as_our_story_begins {
       we_have_an_audit_record_with(:variance => 'nominal')
       we_are_at(:case_display_page)
    }

    assert_page_has_no_button_labeled('Add Audit')
  end

The test doesn't tell how to get to the case display page, create an audit record, create the visit record that audit records require, etc. The code behind the scenes has to figure that out.

I won't show that code. You can find it here. It's a spike, so don't give me a hard time about the lack of tests. What matters is that it works off a description of what transitions are possible in the program. They look like this:

Given a complete set of definitions, statements like this one:

we_have_an_audit_record_with(:variance => 'nominal')

name "milestones" along the path the program has to take to get ready for the test. A simple breadth-first search constructs a complete path out of the milestones. The path contains appropriate instructions to fill in fields and press buttons. Thus a declarative test is turned into a procedure.

P.S. In the declaration, the button's name is given. That's wrong. It should be the HTML id. Like I said: a spike.

## Posted at 09:36 in category /testing [permalink] [top]

Thu, 08 Dec 2005

Working your way out of the automated GUI testing tarpit (part 1)

In this series, I'll present two ideas that have been percolating in my head for a while. Last week, I began thinking they might be appropriate for a client. We ended up taking a different approach, but not until after I'd spent an evening building a prototype. Yesterday, I was so sick of replying to mail, chipping away at a task backlog that's metastasized during recent travel, and slogging through other things I really ought to be doing that I rebelled and decided to rewrite the prototype. It was fun.

The general idea here is (1) to gradually work your way toward declarative tests that generate their own page navigation and (2) to use caching to speed up tests and maybe improve program structure.

I've never tried these ideas for real. They might be impractical in the wild.

Here are three GUI-oriented tests, in increasing order of goodness. The scenario has something to do with a veterinary clinic (of course). In each test, a case record is created, an animal visit is recorded, and an audit record is appended. (All the steps are necessary, because you can't record a visit until there's a case, and you can't create an audit record until there's a visit.) Normally, there can be multiple audits attached to a case. But if the first audit is marked as "nominal", it's the only one that can ever be created. If so, there should be no "Add Audit" button on the Case Management page. That's what the test checks. (It also uses the title of the page to make sure the assertion is checking the right page.)

The first test is like one you might get from a straightforward use of Watir or jWebUnit.


  def test_cannot_append_to_a_nominal_audit
    go('http://app.com/app')

    enter(:login, 'unimportant')
    enter(:password, 'unimportant')
    press('Login')

    press('New Case')

    enter(:client, 'unimportant')
    enter(:clinic_id, '213')
    press('Record Case')

    press('Add Visit')

    enter(:diagnosis, 'unimportant')
    enter(:charges, '100')
    press('Record Visit')

    press('Add Audit')

    enter(:auditor, 'unimportant')
    enter(:variance, 'nominal')
    press('Record Audit')

    assert_page_title('Case Management')
    assert_page_has_no_button_labeled('Add Audit')
  end

What are the problems with this test?

  • In all this code, what's important? Only two lines, which I've highlighted so you can find them easily. In real life, the important lines aren't in bold blue font, so such tests are hard to read.

  • The test is fragile in the face of change. Change the name of a field, introduce another field that has to be filled in, split a page in two: all of these will break this test and many, many others besides. Now you get to fix them all. Because they're hard to read, it's easy to fix them badly. (There are a lot of tests out there that inadvertently no longer test what they're supposed to test.)

  • The test is likely to be slow, because it drives a browser. Programmers who are used to a fast test-code-refactor cycle won't put up with that. So the tests will be run infrequently, and they'll provide information well after it'd be most valuable.

To solve the problem of fragility, some people put a library between the tests and the browser. Here's what such a test would look like:

  def test_cannot_append_to_a_nominal_audit
    go('http://app.com/app')

    login('unimportant', 'unimportant')
    new_case('unimportant', '213')
    new_visit('unimportant', '100', nil)
    new_audit('unimportant', 'nominal')

    assert_page_title('Case Management')
    assert_page_has_no_button_labeled('Add Audit')
  end
  • The test is easier to read, but it has some problems. The fact that an audit record exists is essential to the test, whereas the existence of a visit is incidental. Yet they're given equal prominence. The use of the "unimportant" token makes the use of "nominal" stand out - that particular value must be important to this test. But what about "213" and "100"? They're not important, but there's no convenient "ignore this value" token for numbers.

  • It is more resistant to change than the previous test. If there are changes within a page, you might only have to change one library method.

    But other changes can still break a bunch of tests. In the next iteration, suppose an FDA contact record has to be added before an audit can happen. That means every test that goes directly from adding a visit to adding an audit record will become broken. Either you fix all the tests or you change new_visit to silently add an FDA contact record - which I guarantee will make for some frustrating debugging down the road.

  • It's just as slow as the previous version.

I believe such a test is still not good enough. It's still procedural - it's still of the form "do this... now this... now this... finally you can check what you care about." Here's a better test:

  def test_cannot_append_to_a_nominal_audit
    @browser.as_our_story_begins {
       we_have_an_audit_record_with(:variance => 'nominal')
       we_are_at(:case_display_page)
    }

    assert_page_has_no_button_labeled('Add Audit')
  end
  • This test is declarative. It says that there must be a case with an audit record, but it doesn't say how that record's created. Moreover, it strives to be minimal, to use no word unless it's clearly related to the intention of the test. It says nothing about any of the fields that the previous tests described as "unimportant". It's even silent on the existence of case records and visits, simply assuming that whatever's required for there to be an audit record has happened. (Presumably, requirements like "you can't add an audit record unless there's been a visit" have been tested elsewhere.) All of this makes the test still easier to read.

  • The test is even more resistant to change. Because there's no sequence of steps in the test - no workflow - changes to the workflow will require localized changes in the support code, not to the tests themselves.

  • However, the test is still just as slow as the other ones, so there's room yet for improvement.

In the next installment, I'll show what the code behind the scenes looks like. Right now, I want to emphasize that all three tests do the same thing. Here's an execution log for the third test:

$ ~/src/procedural2declarative 601 $ ruby declarative-test.rb
Loaded suite declarative-test
Started
Go to <http://app.com/app>
Enter "unimportant" into field :login
Enter "unimportant" into field :password
Press "Login"

Press "New Case"

Enter "unimportant" into field :client
Enter "213" into field :clinic_id
Press "Record Case"

Press "Add Visit"

Enter "unimportant" into field :diagnosis
Enter "100" into field :charges
Press "Record Visit"

Press "Add Audit"

Enter "nominal" into field :variance
Enter "unimportant" into field :auditor
Press "Record Audit"

.
Finished in 0.005032 seconds.

1 tests, 1 assertions, 0 failures, 0 errors

## Posted at 08:02 in category /testing [permalink] [top]

Mon, 21 Nov 2005

Story card style

I've been corresponding with Rachel Davies about story card style. She said something wise. Here's a slightly edited version of the correspondence (with her permission).

It all began when I asked a question about the "As a [role], I want [ability], so that [benefit]" style of writing stories. (See Mike Cohn's User Stories Applied for a description.)

Rachel:

Incidentally, I dropped this story format about 2 years ago because it encourages people to think of story cards as mini requirements documents and encourages the grumpy refrain "but it says on the card". I now encourage teams to write only the story name with a marker pen in large caps on 6x4 unlined index cards (easier to read by people standing around planning table or board) because this encourages conversation to continue during the iteration.

Me:

Interesting. What now encourages focus on the benefit and person who benefits?

Rachel:

I agree that novice teams need to be encouraged to ask their customer about this information in the planning game. I recommend using a checklist for each story (do we understand business value, story beneficiary, acceptance test). However, if this information all gets transcribed onto the card then developers just read from the card during the iteration and even if the customer is sitting nearby they don't tend to ask questions. If you leave only the story name on the card then the developers are forced to replay the conversations with the customer (which is a good thing).

## Posted at 09:38 in category /agile [permalink] [top]

Two oblique commentaries on abuse

Not all Americans wanted to [treat prisoners well]. Always some dark spirits wished to visit the same cruelties on the British and Hessians that had been inflicted on American captives. But Washington's example carried growing weight, more so than his written orders and prohibitions. He often reminded his men that they were an army of liberty and freedom, and that the rights of humanity for which they were fighting should extend even to their enemies. Washington and his officers were keenly aware that the war was a contest for popular opinion, but they did not think in terms of 'images' or 'messages' in the manner of a modern journalist or politician. Their thinking was more substantive. The esteem of others was important to them mainly because they believed that victory would come only if they deserved to win. Even in the most urgent moments of the war, these men were concerned about ethical questions in the Revolution.

David Hackett Fischer, Washington's Crossing, p. 276


Confirmation bias is a phenomenon wherein decision makers have been shown to actively seek out and assign more weight to evidence that confirms their hypothesis, and ignore or underweight evidence that could disconfirm their hypothesis [...]

Among the first to investigate this phenomenon was Wason (1960), whose subjects were presented with three numbers (a triple):

2 4 6

and told that triple conforms to a particular rule. They were then asked to discover the rule by generating their own triples and use the feedback they received from the experimenter. Every time the subject generated a triple, the experimenter would indicate whether the triple conformed to the rule (right) or not (wrong). The subjects were told that once they were sure of the correctness of their hypothesized rule, they should announce the rule.

While the actual rule was simply "any ascending sequence," the subjects seemed to have a great deal of difficulty in inducing it, often announcing rules that were far more complex than the correct rule. More interestingly, the subjects seemed to only test "positive" examples; that is, triples that subjects believed would conform to their rule and thus confirm their hypothesis. What the subjects did not do was attempt to falsify their hypotheses by testing triples that they believed would not conform to their rule.

Confirmation Bias, Wikipedia.

In an October 2002 speech in Cincinnati, for example, President Bush said: "We've learned that Iraq has trained al Qaeda members in bomb-making and poisons and gases." Other senior administration officials, including Secretary of State Colin L. Powell in a speech to the United Nations, made similar assertions. Al-Libi's statements were the foundation of all of them.

Al Qaeda-Iraq Link Recanted, Washington Post, July 31, 2004.

According to CIA sources, Ibn al Shaykh al Libbi, after two weeks of enhanced interrogation, made statements that were designed to tell the interrogators what they wanted to hear. Sources say Al Libbi had been subjected to each of the progressively harsher techniques in turn and finally broke after being water boarded and then left to stand naked in his cold cell overnight where he was doused with cold water at regular intervals.

His statements became part of the basis for the Bush administration claims that Iraq trained al Qaeda members to use biochemical weapons. Sources tell ABC that it was later established that al Libbi had no knowledge of such training or weapons and fabricated the statements because he was terrified of further harsh treatment.

CIA's Harsh Interrogation Techniques Described, ABC News, Nov. 18, 2005.

## Posted at 09:15 in category /misc [permalink] [top]

Fri, 18 Nov 2005

Two milestones, noticed while paying bills

I am now a million-mile member of the American Airlines frequent flier program. This entitles me to two luggage tags.

I am also entitled to a Free! "Guide to Planning and Promoting Your Business Anniversary" in honor of fifteen years of business.

Yippee.

P.S. Hugh Sasse points out that the Ruby extensions library has a method like the one that's mentioned below. After a quick glance, I think it's better than mine.

P.P.S. Oh, OK, I also get eight upgrade segments and permanent Gold membership. And a membership card that says "1 Million" on it.

## Posted at 11:38 in category /junk [permalink] [top]

Mon, 14 Nov 2005

Attractive Ruby tests that use multi-line strings

Suppose you're testing some method whose input is a multi-line string. You could write something like this:

  def test_tags_can_be_deeply_nested
    table = "<table>
              <tr><td>
                <table>
                 <tr>
                  <td>
                    <table>
                     <tr>
                        <td>
                             Way nested
                        </td>
                     </tr>
                    </table>
                  </td>
                 </tr>
                </table>
              </td></tr>
             </table>"
    slices = TagSlices.new(table, "table")
    # blah blah blah
  end

That's fine - unless whitespace in the middle of the string is significant. The above method has no whitespace on the string's first line, but a whole lot on the others. What if I needed it all to be flush left? This is ugly:

  def test_tags_can_be_deeply_nested
    table =
"<table>
  <tr><td>
    <table>
     <tr>
      <td>
        <table>
          <tr>
            <td>
                 Way nested
            </td>
          </tr>
        </table>
      </td>
     </tr>
    </table>
  </td></tr>
 </table>"
    slices = TagSlices.new(table, "table")
    # blah blah blah
  end

I could argue that the ugliness makes it too hard to see the structure of the test and too hard to skim a file quickly and see what the tests are. That argument may even be true, but the real reason I don't like it is that it's ugly.

So I write such tests like this:

  def test_tags_can_be_deeply_nested
    table = "<table>
            . <tr><td>
            .   <table>
            .    <tr>
            .     <td>
            .       <table>
            .         <tr>
            .           <td>
            .               Way nested
            .           </td>
            .         </tr>
            .       </table>
            .     </td>
            .    </tr>
            .   </table>
            .    
            . </td></tr>
            .</table>".unindent
    slices = TagSlices.new(table, "table")
    # blah blah blah
  end

unindent removes the whitespace at the beginnings of lines, together with the discrete margin made of dots. Its code looks like this:

class String
  def unindent
    gsub(/^\s*\./m, '')
  end
end

I've fiddled around with unindent to a ridiculous degree, changing its name, how it works, how the margin is indicated. I think I've settled on this one.

## Posted at 17:00 in category /ruby [permalink] [top]

Sun, 13 Nov 2005

Throwing tests away

In part of "When should a test be automated?", I look at tests that have broken because the intended behavior of the code changed. My urge is to fix them, but I stop myself and ask a question: if this test didn't exist in the first place, would I bother to write it? If not, I shouldn't bother to fix it. I should just delete it.

Here's an example where that practice would have led me astray.

I've been hacking away at the code in RubyFit that parses HTML. I'm changing it so that it will support Rick Mugridge's FitLibrary. To that end, I created a class, TagSlices, that splits HTML text at tag boundaries. For example, the TagSclices of foo<tag x="y">bar</tag>quux would be foo, <tag x="y">, bar, <tag>, and quux.

I'd looked at the Java implementation before starting. That code operates on a lowercased string for tag-matching, but returns chunks of the original string. In the implementation I started moving toward, maintaining the two strings was inconvenient, so I talked myself into thinking I could downcase the original string at the start and work only with that. Stupid (people sometimes do use capital letters in web pages), but I was backing away from a frustrating implementation closely modeled after the Java one - and thus un-Ruby-like and hard to get right. I was so focused on tags that I thought what was OK for them was OK for everything.

I'd generated TagSlices using a set of tests that did not reveal the bug. After I was done, I reran the unit tests for the old version. (I hadn't used them for development because they "chunked" the problem in a way that didn't fit the path I was taking.)

Here's one of those tests:

  def test_parsing
    p = Parse.from_text 'leader<Table foo=2>body</table>trailer', ['table']
    assert_equal 'leader', p.leader
->  assert_equal '<Table foo=2>', p.tag
    assert_equal 'body', p.body
    assert_equal 'trailer', p.trailer
  end

It failed on the line marked with an arrow. I thought about that. Was the failure due to a bug? No, I'd decided it was harmless for tags to change case. Did any other assertion fail? No. Was the test completely redundant with other tests? It seemed so. So I should have thrown the test away. But I hesitated. After all, the changed behavior was a side effect, an implementation convenience. It would be just as harmless for tags to keep their case and pass the test. Maybe that wouldn't be as hard as I'd thought when I'd started. I looked at the code and it suddenly flashed on me that lowercasing the whole string wasn't harmless at all.

And, moments later, I realized that Ruby is a scripting language, after all; as such, it lives for regular expressions. Maybe in the Java world, it makes sense to search a lowercased string for "<table". In the Ruby world, it's better to search the original string for /<table/i.

So I created a test that talks specifically about case in tags and non-tag text. I made it pass. The old test passed, too. I could have thrown it away. And yet... what else might it uncover someday? So I kept it.

I shouldn't extrapolate too much from a single example, but it makes me wonder. Seven years ago, when I wrote the paper, I was solidly embedded in the testing culture, a culture of scarcity, one in which:

  • Automated tests were expensive to write because they had to go through an interface not designed for testing.

  • Programming time to fix that was almost entirely unavailable.

  • You were never anywhere close to having as many tests as you thought you needed, so the opportunity cost of fixing an old test was high.

Those assumptions are less true today. Because of that, it makes more sense to change old tests on the off chance you might find bugs or learn something. One of my other testing guidelines is to seek cheap ways to get dumb luck on your side. I'm not smart enough to do without luck. (That's not false modesty: I bet you aren't either.) Fiddling with tests is perhaps now cheap enough to be a luck generator.

(P.S. The bug would have certainly been caught by even the simplest realistic use, so it wouldn't have survived long.)

## Posted at 18:02 in category /testing [permalink] [top]

A rant: filenames

Java is a good thing. Rails is a good thing. But just as Java /perpet[ur]ated/ the horrors of StudlyCaps on those of us who like to read quickly, Rails is /perpet[ur]ating/ underscores in filenames on those of us who like to write quickly.

There are legions of examples of people not acting according to their rational self-interest. Yet another is the prevalence of filenames like webrick_server.rb over webrick-server.rb. Does an underscore take more energy to type than a dash? Yes. Does avoiding a typo require more coordination? Yes. So why this pathology?

  • C used to be the programmer's lingua franca. Since '-' in C means subtraction, variable names conventionally contain underscores. Did the trailblazer Unix programmers not realize that filenames don't have to follow the same rules?

  • Is it because VMS only allowed underscores in filenames, and VMS is just so totally cool?

Whatever the reason, we must consider the result. Thousands upon thousands of people already suffer from Emacs Pinky. To add injury to injury, those fragile pinkies must suffer additional unnecessary damage striking the shift key. How many people have been forced to switch to vi because those underscores pushed them over the edge to pinky RSI? That's a tragedy no caring person can ignore.

You have a lot to answer for, David HH. I was this close to convincing the world to use dashes, and now there's no hope.

## Posted at 18:02 in category /junk [permalink] [top]

Sun, 06 Nov 2005

Errors as essential

[Austin's procedure] consists in recognizing that [...] failure is an essential risk of the operations under consideration; then, in a move that is almost immediately simultaneous [...] it excludes that risk as accidental, exterior, one which teaches us nothing about the [...] phenomenon being considered.

Jacques Derrida, Limited Inc, p. 15.

This puts me in mind of a commonplace of UI design: that a popup error dialog should prod you to reexamine the system. Can it be changed to make the error impossible? or a modal dialog unneeded? For the latter, see Ward Cunningham's Checks pattern language, which - if I recall correctly - treats entering bad input as an inherent part of entering input, something to be fixed at the human's convenience, not something to interrupt the flow.

It also reminds me of my insistence that Agile projects are learning projects, and that you're probably not learning how to do something right unless unless you try variations and extensions that turn out to be wrong. But there has to be a way of talking about it that doesn't use the words "mistake" or "wrong" because - hard as it may be to believe - a lot of people think those are bad things.

## Posted at 22:01 in category /misc [permalink] [top]

Fri, 04 Nov 2005

Coming to AYE? Bring trinkets

In my first AYE session ("An amateur's guide to communicating requirements"), up to 1/3 of the participants will teach some skill to other members of their group. It might be a card trick, a coin trick, origami, building a house of cards, juggling, situps, headstands, or flipping pancakes over in a skillet. It's best if the skill involves some object. Due to Circumstances Beyond Our Control, attendees weren't sent email asking them to bring objects if they have a skill to demonstrate.

So if you're coming to my AYE session, please bring any objects you need to demonstrate your skill. Thanks, and please spread the word to anyone you know is coming.

## Posted at 07:35 in category /misc [permalink] [top]

Wed, 02 Nov 2005

A thought, inspired by the CSS2 specification

Specifications are a tool for resolving disputes. They are not a communication or teaching tool.

Sentences in a specification are the tangible and checkable evidence that a dispute among specifiers has been resolved. The specification is also used as strong evidence when two programmers have an implementation dispute, or when a tester and a programmer do. But almost no one, given the option, would choose to learn CSS by reading the specification.

That suggests that a specification should not be written to a consistent level of precision. Precision is needed only where disputes have already occurred or are likely. You can be happy when politics and economics allow you to let all precision be driven by actual, rather than anticipated, disputes.

## Posted at 21:46 in category /misc [permalink] [top]

Three useful links

Here are three links I plan to point clients at:

## Posted at 21:46 in category /fit [permalink] [top]

Tue, 01 Nov 2005

Four questions

Jonathan Kohl and I have been having a little conversation prompted by my comments on the Satir model of communication. He listed three questions he asks himself as he interacts with his team:

  • Am I trying to manipulate someone (or the rest of the team) by what I'm saying? An example he gave me is exaggerating a testing problem so that a programmer will look at a bug that's being ignored. (Sometimes testing can't proceed until a bug is dealt with.)

  • Am I not communicating what I really think? One example would be agreeing with people to avoid conflict. (That's different than disagreeing with a proposal, acknowledging the disagreement, and then agreeing to try the proposal anyway. After all, you're roughly as likely to be wrong as anyone else.)

  • Am I putting the process above people? An example that Jonathan gives is deciding that the Customer is by definition right on assessments of value and that the programmers should swallow their discomfort and start coding. I'm sometimes guilty of that.

He thinks of these in terms of Satir's notion of congruence, which is not an idea that rocks my world. (I'm more interested in external behavior than internal state: what I do in the world rather than my position in relationship to it.) The value of the questions is independent of their background, I think.

I've added a fourth:

  • Will those people have good reason to trust me more after this conversation?

I think I use that to square Jonathan's sentiments with my suspicion that there's a lot of useful manipulation out there.

Consider what I do as a coach. An ideal coach - which I am not - will act mostly by exploiting opportunities to jiggle someone else into discovery. I learn best through discovery, and it seems most people I work with are the same. So I am actively training myself to hold back, keep my mouth shut, and let my pair run with an idea, all the while being ready to say things that will help her realize what's happening. Then we can talk about it. (I've noticed that Ron Jeffries is substantially better at this than I am.)

The nice thing about this approach is that it gives me room to be wrong. A lot of the time, what I thought would happen doesn't - her idea was better than mine - and a variant lesson gets learned (by both of us).

Nevertheless, it'd be fair to call me manipulative. The saving grace is that I'm happy for people to know what I'm doing; I don't believe writing this note will make people trust me less.

The focus on trust also keeps me from overdoing it. I resent it when teachers put me in an artificial scenario where they know precisely the troubles I'll have and what lessons I will not be able to avoid learning. I don't trust such people. From experience, I doubt they'll be tolerant of the perverse conclusions I tend to draw. So when I draw them, things turn into an Authority Game with a dynamic of them proclaiming Trvth at me and me being resistent.

(The devolution into such a game is an example of putting process above people. I bet my fourth question is, strictly, subsumed by the remaining three. But "men need more often to be reminded than informed" (*). Given that I do have a regrettable authoritarian streak, redundancy is OK.)

I think Jonathan's questions will help me, going forward.

(*) Warren Teitelman, I think. He was giving a talk on the Cedar programming language and programming environment.

## Posted at 07:35 in category /agile [permalink] [top]

Sun, 30 Oct 2005

Welcome, Better Software readers

In my editorial for the November Better Software, I use the string of numbers at the top of my blog as an example of a big visible chart. They've helped me lose about 25 pounds. Especially once I started meeting people who told me they tracked them, my impulse to show the world steady progress remained strong for about 20 of those 25 pounds. (I've even met someone who said he was inspired to do the same with his blog.)

More recently, progress has been more fitful, as can be seen by the abundance of red. (Green means I've lost at least two pounds in the week, red means I've lost some but not enough, and bolded red means I've gained weight.) My just-ended three-week trip (to PNSQC, RubyConf, OOPSLA, and a client site) did special damage. Naturally, I did that damage just as the November issue is being mailed. But next week will be green.

## Posted at 21:48 in category /misc [permalink] [top]

A thought on mocking filesystems

(Sorry about the earlier version of this. I wrote some notes locally and then accidentally synchronized with testing.com.)

Sometimes people mock (or even just stub) some huge component like a filesystem. By doing that, you can test code that uses the filesystem but (1) have the tests run much much faster, (2) not have to worry about cleaning up because the fake filesystem will disappear when the test exits, (3) not have to worry about conflicting with other tests running at the same time, and (4) have more control over the filesystem's behavior and more visibility into its state.

I was listening to someone talk about doing that when I realized that any component like a filesystem has an API that's adequate for a huge number of programs but just right for none of them.

So it seems to me that a project ought to write to the interface they wish they had (typically narrower than the real interface). They can use mocks that ape their interface, not the giant one. There will be adapters between their just-right interface and the giant interface. Those can be tested separately from the code that uses the interface.

Arguing against that is the idea that giant mocks can be shared among projects, thus saving the time spent creating the custom mocks required by my alternative. But I'm inclined to think it's a good practice to write an adapter layer anyway. Without one, it's painful to swap out components: uses of the old component's API are smeared throughout the system.

## Posted at 21:28 in category /coding [permalink] [top]

Thu, 13 Oct 2005

A Watir win

At PNSQC, Michael Kelly gave a talk. Among other things, it covered converting functional test scripts into performance test scripts. He gave examples using several tools, one of them Watir.

As he described how little of the test script had to change to make it a performance test script, I realized that there was a way to make it require no changes. I didn't quite finish a demo during his talk, but I did shortly after. Here's an example:

Load (a fake version of) Watir, get an object representing IE, and ask it to go to a URL.

irb(main):001:0> require 'watir'
=> true
irb(main):002:0> ie = IE.new
=> #<IE:0x329144>
irb(main):003:0> ie.goto('url')
=> "here's a bunch of HTML"

(I faked out Watir because I didn't have net access and, anyway, this machine is a Mac.) What you can't see in the above is that goto delays a random number of seconds before returning.

Now I want to run the same "test", timing all gotos.

irb(main):004:0> require 'perf'
=> true
irb(main):005:0> IE.time(:goto)
=> nil
irb(main):006:0> ie.goto('url')
1.000129
=> "here's a bunch of HTML"

It took just over a second.

Here's the code. Mike will be seeing about getting something like it into the Watir distribution.

class IE
  def self.time(method)
    method = method.to_s
    original_method = '__orig__' + method
    new_def = "alias_method :#{original_method}, :#{method}
               def #{method}(*args, &block)
                  start = Time.now
                  retval = #{original_method}(*args, &block)
                  puts Time.now - start
                  retval
               end"
     class_eval(new_def)
  end
end

Take that, WinRunner!

## Posted at 15:08 in category /ruby [permalink] [top]

Tue, 11 Oct 2005

Hoist by my own petard

I started my PNSQC talk by asking for three volunteers. I handed each a Snickers bar and told them to eat it. After they did, I asked whether they were confident their body would be successful at converting that food into glucose and replenished fat cells. Then I gave them part of the CSS specification. I asked them whether they thought they could be successful at converting that information into a conformant implementation. Unsurprisingly, they thought digestion would work and programming wouldn't. How odd, I said, that digestion and absorption works so much better than the simpler process of programming.

The idea here was to set the stage for an attack on the idea that (1) we can adequately represent the world with words or concepts, and (2) we can transmit understanding by encoding it into words, shooting it over a conduit to another person, and having them decode it into the same understanding.

Things did not go exactly as planned. After I gave them the Snickers bars, I was surprised when they balked and asked me all kinds of questions about eating it. I thought they were deliberately giving me a hard time, but one of them (Jonathan Bach) later told me that he was honestly confused. He said something like, "it would have been much clearer if you'd shown us what you wanted by eating one yourself."

... if I hadn't tried to transmit understanding down the conduit...

... if I'd explained ambiguous words with an example. In a talk about the importance of explaining with examples.

I'm glad Jonathan was clever enough to catch that, because the irony of it all would have forever escaped me.

P.S. It now occurs to me that another problem was that they didn't know why they were to do it. That's something I also covered in the talk: "justify the rules" from the list of tactics. I don't mind not telling them why, since telling them would have spoiled the effect, but not using an example just makes me slap my head.

## Posted at 07:39 in category /agile [permalink] [top]

Communication between business and code

In a few hours, I'll be giving a presentation at PNSQC. It's on communication between the business experts and the development team. After some audience participation involving Snickers® bars, trapezes, and Silly Putty® (actually, only Snickers bars) and some airy-fairy theorizing, I get down to discussion of 16 tactics. Here they are.

When it comes to teaching programmers and testers about a domain, examples matter more than requirements. It's fine to have statements like "always fill stalls up starting with the lowest number, except for the sand stalls and the bull stall". But when it comes time to be precise about what that means, use examples (aka tests). I think of requirements statements as commentaries on, or annotations of, examples.

It's best to build examples with talk around a whiteboard. There, a business expert explains an example to a small audience of programmers, testers, technical writers, etc. The conversation includes these elements:

  • People should ask questions about details. If the business expert casually says, "So we have a cow in stall 1", ask why it's in stall 1. The answer might be, "well, actually, it probably wouldn't be in stall 1, because that's the bull stall" - which now alerts everyone that there are rules surrounding which animals go in what stalls. Those rules might not matter soon, but it doesn't hurt to be aware of them.

  • Turn stories into rules. If the business expert says things like "well, since the bull stall is reserved for dangerous animals, we'd put an ordinary case in the next available stall," you have a rule that stalls are allocated into increasing order. That rule is something that will probably be found, in some form, in the code.

  • Still, favor complete examples over complete rules. The rules don't have to be precise; they're mainly a reminder to write precise examples. Expect the real heavy lifting of creating rules to be part of the programming process; the programmers will discover rules that cover the examples. (See my old favorite, the Advancer story.)

    Nevertheless, some early attention to rules helps shift the emphasis from procedural examples to declarative examples and from UI gestures to business logic.

  • Participants whould ask the business expert to justify the rules. Why is it that stalls are allocated in increasing order? There might be no particular reason, but it might be that stalls are numbered counterclockwise, so by housing cases in numerical order, a student working on her cases in stall order would walk directly from case to case instead of having to plan a route.

    What's happening here is that the development team is learning facts about the domain. Any set of requirements, examples, or other kinds of instructions to the team will leave them underconstrained. At some point, they'll make decisions that are not forced by anything the business expert said. If they understand the "why" behind statements, they're more likely to make sensible decisions.

  • People should ask about exceptions: "when is it not done like that?" It's the exceptions that make rules tricky, and the exceptions that will drive the creative part of programming.

    Now, it's awfully easy to ask an expert for exceptions to the rules, much harder for the expert to think of them. So there are tactics for eliciting exceptions (as well as new rules and new domain knowledge).

    • Ask for stories from different points of view. The most natural point of view is probably that of a user of the system. So find opportunities to ask for the story of a medical case from the first call to get an appointment to the last time someone touches its record. Or look at the path of an inventory item through the system. (As an example of this, see the opening scenes of the movie Lord of War. I consider that a spoiler, but seemingly every critic saw fit to describe it.)

    • When telling the story of a user, you have the opportunity to pick a persona. Don't always use a normal one. Consider how Bugs Bunny (a trickster character, a rule breaker) would use the system. How about the Charlie Chaplin of the factory scenes in Modern Times: the completely overwhelmed worker who can't keep up? (I learned this trick from Elisabeth Hendrickson.)

    • You can also try Hans Buwalda's soap opera testing (an example). In soap opera testing, you construct an example of use that bears the same relationship to normal use as a soap opera does to real life: dramatically compressed, full of cascading and overlapping implausibilities.

    • Be alert for synonyms. Suppose a clinician uses the words "release" and "discharge" in different contexts but cannot articulate the difference between them. It's natural to just pick one of them and use it henceforth. I'm more likely to want to make the system support both words (by one common routine) in the hopes that a distinction will eventually emerge.

In all of this, attend to pacing. The programmers have to learn about the domain. It's easy to overwhelm them with exceptions and special cases while they're still trying to grapple with the basics. So start with basic examples and consider the elaborations once they've demonstrated (through working code) that they're ready for them.

Give the product expert hands-on fast feedback. Anything written down (like examples, tests, or requirements document) puts the reader at one remove from the actual thing. Consider the difference between test-driving a car and reading about a test drive of a car. So quickly implement something for the product owner to look at. That will allow her to correct previous examples and also learn more about how to communicate with the team.

It's also important for everyone to work the product. You don't learn woodworking by looking at examples and listening to someone talking about woodworking. You learn by working wood. The programmers, testers, etc. on a team don't need to become experts in the business domain, but they do need to learn about it (again, so they can make unforced choices well). Having people use the product realistically, especially in pairs, especially with the business expert near, will help them. I recommend exploratory testing techniques. James Bach's site is the best place to learn about them.

I think of the team as building a trading language. This is a language that two cultural groups use to cooperate on a common goal. (See also boundary objects.) In a trading language, the programmers and business expert will both use words like "bond" or "case" -- indeed, it's best if those words are reified in code -- but they will inevitably mean different things by them. It's important to accept that, but also to attend to cases where the different meanings are causing problems. I happen to also think that the business expert should become conversant in the technology domain, just as programmers become conversant in the business domain. That doesn't mean to become a programmer, but it does mean to come to understand enough of the implementation to understand implementation difficulties and opportunities.

Finally, since understanding is built, not simply acquired, it's important to attend to learning through frequent mini-retrospectives. Is the development side of the team learning the domain? Is the business side learning about the implementation? Is the business side learning about the domain? -- I think any project where the business expert doesn't gain new insights into the business is one that's wasted an opportunity. Is everyone on the team learning about communication?

## Posted at 07:39 in category /agile [permalink] [top]

Sun, 09 Oct 2005

PNSQC annotated bibliography

In my Pacific Northwest Software Quality Conference talk, I'm going to throw out a blizzard of references. Here they are.

"The conduit metaphor: A case of frame conflict in our language about language", Michael J. Reddy, in Metaphor and Thought (2/e), Andrew Ortony (ed.)

Shows how our standard metaphor for communication is one of shipping something from one mind to another via a conduit. Many examples. I don't think that's the way communication really works, so the metaphor misleads us into thinking requirements documents are a good idea, and that the reason they so often fail is that we're not smart or dedicated enough.

Philosophy and the Mirror of Nature, Rorty

Takes apart the idea that concepts and words are direct mappings of hard-edged categories in the world. Sometimes that works. Sometimes it doesn't. Requirements documents assume that words can capture what the solution to a problem essentially is. But if that's not possible, in general...

Women, Fire, and Dangerous Things: What Categories Reveal About the Mind, Lakoff

A more empirical treatment of the same idea. Categories have fuzzy edges, partly because most lack a single defining characteristic that must be present.

Personal Knowledge: Toward a Post-critical Philosophy, Polyani

A discussion of tacit knowledge.

Refactoring, Fowler

I tell the Advancer story, which is uses the Method Object refactoring, which is described in Fowler.

Cognition in the Wild, Hutchins

I suggest the Advancer story is an example of Hutchin's distributed cognition, the notion that it sometimes doesn't make sense to say that particular people solve a problem. Instead, it's more useful to point to an assemblage of people and things as doing the thinking. So the common statement "the code is trying to tell us something" is not meaningless.

Fit for Developing Software: Framework for Integrated Tests, Mugridge and Cunningham

I use Fit tables as examples of examples.

"Soap Opera Testing", Buwalda

Soap opera tests have the same relationship to normal uses of the product as soap operas have to real life: drastically condensed and exaggerated.

Image and Logic: A Material Culture of Microphysics, Galison

Galison discusses how different groups of people collaborate on shared goals. He claims that they develop "trading languages" (like trading pidgins and creoles) to organize their work. I believe his analysis fits project teams.

Domain-Driven Design, Evans

Evans's "ubiquitous language" is an example of a Galison trading language.

"The Good, the Bad, and the Agile Customer", Alesandro, Better Software, November/December 2005.

An example of gradually tuning communication to suit both a business representative and a development team.

Situated Learning: Legitimate Peripheral Participation, Lave & Wenger

A description of how types of learning like apprenticeship work. Not an easy read. I wrote a review and summary.

How to Do Things with Words (2/e), Austin

Austin talks about "performatives", which are sentences that don't describe the world but rather do things in it. ("I now pronounce you husband and wife.") Performatives don't really fit in the "words map to categories in the world" framework.

Limited, Inc., Derrida

Derrida (I'm told) argues here that all statements are performatives. (I haven't read the book yet - I've not found Derrida to be easy reading.) That makes sense to me: we utter things to change the world. Even when I'm defining a word for my daughter, I'm changing the world inside her head.

That's where I end, having given 16 tactics for improving communication, each compatible with this nonstandard view. My ending motto is this:

As intermediaries, we do not need to send abstractions down the conduit from the business to the programmer. Anything that provokes the programmer to write the right code is fine by us.

## Posted at 09:49 in category /misc [permalink] [top]

Sat, 08 Oct 2005

Blaming and lecturing

Satir's models of communication, change, and communication stances are influential among those who worry about software team dynamics. I'm uneasy about them on two grounds.
  • One is that the categories they draw strike me as too big. Consider the communication stances. The model identifies three "things in the world": Self, Other, and Context. People take bad communication stances when they (try to) ignore one or more of those things. For example, a Placating person will ignore Self in favor of Other and Context.

    My difficulty is that there are so many Others and pieces of Context ready-to-hand at every moment (even if you're talking to one person about one thing) that I'm uneasy about the idea of ignoring Context or Other. That probably means ignoring a lot of the Context or Other, but the parts you don't ignore are probably awfully important. (And, as a teensy bit of a postmodernist, I'm not 100% sure it's always that useful to think of a unitary self, so even ignoring Self is maybe not such a straightforward idea.)

    Now, I expect the model has been expanded, but my informal encounters haven't shown me the elaborations. Perhaps I will at the AYE conference.

  • The other source of unease is that Satir's models are grounded in family therapy. That, it seems to me, often leads to overconcentration on the negative. Function becomes the absence of dysfunction, joy becomes the absence of frustration. One becomes "congruent" by ceasing to ignore one, two, or three of the things in the world.

    For example, in the change model crisis kicks off change. Change must push through resistance. Again, that's certainly often true (and consultants must often deal with resistance). But that's not the way all change happens. Some people like change, and others are agnostic (the change threatens nothing they particularly care about). My impression is that a lot of the elements of XP were more motivated by a harkening back to an idyllic time at Tektronix Labs than by stark necessity.

    (A preference for Satir may be a product of selection bias. Back when I was a pure testing consultant, I - like an awful lot of consultants - got called almost exclusively into companies with problems. There, the Satir model is so often appropriate that it must be easy to see dysfunctional family life everywhere. Now that I'm consulting in Agile, I more often go to companies that are doing perfectly OK and want to do better. That promotes a sunnier view of life.)

But that's not what I mainly wanted to write about. In the communication stances model, the ignoring of Other leads to Blaming behavior. If I model my own behavior that way, I'd say Blaming is not often the result. What I do more is Unstoppable Framing and Advice-Giving. It's figuring out what the problem and its context are, plus throwing out all kinds of potential solutions. That's different than Satir's Super-reasonable behavior, which is "cool, aloof, reasonable, and intellectual". I'm not cool or aloof; I'm usually passionate and determinedly optimistic - "hey, how about this. It would turn the problem on its head and make it a neat opportunity."

That's helpful behavior except when it becomes more about me and less about the Other I'm supposedly helping, when it becomes a way to shift the issue away from what the other person needs to what I'm good at doing: problem-solving, idea generation, and talking. I'm using the Context as a way of making my Self comfortable. The solution (there I go again) is to make sure to let the Other guide the conversation.

I bet Dawn (who's witnessed more of this from me than anyone else has) would describe it as stereotypically male behavior. It probably is statistically more common among males. But I'd be willing to bet it's an occupational hazard for consultants.

So, by Box's criterion that all models are wrong, but some are useful, Satir's model is useful. I don't use it much, though.

## Posted at 18:31 in category /misc [permalink] [top]

Wed, 05 Oct 2005

Tweak to CalculateFixture

I've become fond of FitLibrary's CalculateFixture. The second table below is an example:

create clinic with 7 stalls
which are sand stalls? 4, 6
which are bull stalls? 1

...

When there's no room for an animal in a normal stall, put it in a sand stall, or in a bull stall as last resort.

how stalls are assigned
special stall? stalls in use stall assigned
no 2, 3, 5, 7 4
no 2, 3, 4, 5, 6, 7 1

The columns to the left of the blank column are input values. Those to the right are expected results. The blank column is a nice visual separator.

Each line of a table should be easy to understand. Sometimes that means annotation. I've hacked my copy of CalculateFixture to allow notes after yet another blank column. Like this:

create clinic with 7 stalls
which are sand stalls? 4, 6
which are bull stalls? 1

...

When there's no room for an animal in a normal stall, put it in a sand stall, or in a bull stall as last resort.

how stalls are assigned
special stall? stalls in use stall assigned notes
no 2, 3, 5, 7 4 Only sand stalls are free, so use one.
no 2, 3, 4, 5, 6, 7 1 No place to go but bull stall

That's easily done in the 9Feb2005 version. In bind:

for (int i = 0; heads != null; i++, heads = heads.more) {
    String name = heads.text();
    try {
	if (name.equals("")) {
+           if (pastDoubleColumn) break;
//          if (argCount > -1)
//              throw new FitFailureException("Two empty columns");
            argCount = i;
            targets = new MethodTarget[rowLength-i-1];
      

And remove an error check in doRow:

if (row.parts.size() != argCount+methods+1) {
    exception(row.parts,"Row should be "+(argCount+methods+1)+" cells wide");
    return;
}
      

I'd like to see this change become part of FitLibrary. It cannot break existing tables (because of the error check). The error check would no longer catch mistakes in the table, but I can't see such a mistake lasting past the first time someone tried to make the test pass. It would be easy enough to change the error check to take blank columns into account, but I doubt I'd bother.

## Posted at 15:31 in category /fit [permalink] [top]

Sat, 01 Oct 2005

Need project pictures

For my talk at the Indianapolis Quality Enrichment Conference (October 7), I could sure use some pictures of an Agile project in action. I'd like to show a daily standup and a product owner explaining a story, preferably using a whiteboard. If anyone can mail me some, I'd be much obliged and would give credit. Thanks.

I've got other pictures I think I need, but if you can think of events or settings that people really ought to see, send them along. Even if I don't use them this time, this probably won't be my last talk about Agile.

## Posted at 18:01 in category /agile [permalink] [top]

Fri, 30 Sep 2005

First mover disadvantage

A while back, I stayed at a Marriott hotel. Around US$150 at a reduced conference rate, plus US$10 for wired high-speed internet. Not long before then I'd stayed at a no-name hotel for US$70. It had free wireless.

Why the difference? I suppose it's just whatever it is that makes hoteliers think the more expensive the room, the more expensive should be the bottle of Aquafina water placed in the room.

But I also toyed with the thought that it could be a form of first-mover disadvantage. Marriott no doubt put in high-speed internet before wireless was even an option. The other hotel waited. I suspect it's a lot more expensive to string wires all over the hotel than to put up wireless hubs. Having strung those wires, was Marriott now stuck when wireless came along? Were there financial or emotional (sunk cost fallacy) reasons for sticking with a solution that's been superseded?

Are early adopters of Agile subject to the same? Are they (we?) prone to getting stuck at local maxima? I suppose that's inevitable. It's sometimes said that the fiercest critics of this generation's avante garde isn't the boring old mainstream; it's last generation's avante garde.

Bob Martin said at Agile 2005 that industry seems to be converging on a standard Agile approach: Scrum with a subset of the XP practices. That's good, but what's the next step beyond that? I personally think it's about the Product Owner / Customer. People used to say things like "programmers can't test their own code" or "programmers are asocial" or "requirements must be complete, testable, and unambiguous." Turns out we now know (mostly) how to make those things untrue. There are a lot of statements about product owners that are like that. How many of them have been made untrue? Doesn't seem like many. So I bet there's work to do.

One such statement is "To the product owner, the UI is the software." I've heard that as a support for Fit ActionFixture over DoFixture. Not because ActionFixture describes the actual UI, but because product owners can grasp that metaphor, whereas DoFixture would be too abstract.

Now, I'm biased, because I think that it's almost always the business logic that delivers the value. The UI is a way to make that value accessible. It's secondary. So I want the product owner to think, um, like me. Or better than me - to come to some product understanding / conceptualization / language that's surprising, that reveals product possibilities in the way that refactoring reveals code possibilities.

The question is: what nitty-gritty techniques can be used for that? It makes no sense to go up to someone and say, "Reconceptualize your domain!" I have a low opinion of teaching people how to think, and a high opinion of teaching them how to do.

## Posted at 19:53 in category /agile [permalink] [top]

Wed, 28 Sep 2005

Upon the occasion of a school meeting

Our children go to publicly-funded schools, partly because I buy the argument, from The End of Equality, that it's been important for US society that we've had places where citizens of different incomes and classes mix.

My son has some developmental difficulties. Nothing major, nothing romantic - but a need for help with fine motor control, speech, and some social interactions.

I've worked with big organizations and small, old ones and young ones, monopolies and competitors. I have a bias - both instinctive and learned - toward the small, the young, and the competitive. (I was once on the technical board of a small company that had a niche market to itself, and I watched how the appearance of a competitor concentrated their mind on bettering the product. It was a wonderful example of how competition is a burden placed on organizations for the benefit of the rest of us.)

So I should be - am - naturally suspicious of the public school system, which is large, set in its ways, bureaucratic, and much like a monopoly. But I'm here to tell you that the people - teachers, administrators, school therapists - have, almost without exception, been wonderful. I've been around. I know the difference between people just doing a job and people motivated to do a good job. These people would be a credit to any organization, even the smallest, youngest, and leanest.

It's almost as if employees can be motivated by something other than money and status.

So the next time you hear a politician speaking scornfully of the teacher's union or the school system, just remember there are a lot of good people in those organizations. Not only are they working hard with resources so limited they make your organization look like the US Congress funding public works in Alaska, they're doing it while enduring constant insults.

## Posted at 07:53 in category /misc [permalink] [top]

Being wrong

Hardly anyone thinks the software industries do a satisfactory job of getting the requirements / architecture / design right up front. The reaction to that, for many many years, has been that we should work smarter and harder at being right. Agile flips that around: we should work smarter and harder at being wrong. We should get so good at being wrong that our predictive failings do no harm to us, our project, our employer, or our users. In fact, we should strive to make mistakes---the need to redo---a constructive resource.

## Posted at 07:53 in category /agile [permalink] [top]

Fri, 23 Sep 2005

Overcorrection

Long ago, I learned to fly gliders. Since they don't have engines, they're towed up into the air. So you spend the first minutes of your flight at the end of a long rope that's tethered to a tow plane. As a pilot, your job is to keep your glider in a good position relative to it.

As a novice pilot, I had a problem with "porpoising." I might drift up out of the right position, so I'd push the stick forward to descend, but I'd overcorrect so I'd descend too far, so I'd pull the stick back but this time get even higher out of position, so... Eventually, you can oscillate so badly that you become a danger to the towplane.

One time, my instructor gave me some advice. "Don't just do something, sit there," he said. Let your status stabilize and become clear before you correct. And, I extrapolate, make small corrections that stabilize faster so that you know more quickly what you've done.

I think of that slogan every once in a while.

## Posted at 09:45 in category /misc [permalink] [top]

Wed, 21 Sep 2005

A tour through a Fit episode

On the agile-testing list, someone asked this question:

What I mean is a workflow requirement like:

If A then
  do something (which may be an entire function in itself)
Else if B
  then do something else
Else if C
  Then do nothing

Is it possible to express a requirement of this sort using Fit/Fitnesse?

Here's my answer, which goes afield into business-facing test-driven design.

My inclination would be to test the business rule directly. Here, I'm using Rick Mugridge's CalculateFixture. With it, the inputs and expected results are separated by a blank column.

All Significant Events in the Reactor's Life
condition   operator notified? auto shutdown?
really hot   YES YES