Exploration Through Example

Example-driven development, Agile testing, context-driven testing, Agile programming, Ruby, and other things of interest to Brian Marick

191.8 ⇒ 167.2 ⇒ 186.2 183.6 184.0 183.2 184.6

Thu, 29 Jan 2004

Order of tests

Suppose you have a set of tests, A through Z. Suppose you had N teams and had each team implement code that passed the tests, one at a time, but each team received the tests in a different order. How different would the final implementations be? Would some orders lead to less backtracking?

I decided to try a small version of such an experiment at the Master of Fine Arts in Software trial run. Over Christmas vacation, I collaborated with my wife to create five files of FIT tests that begin to describe a veterinary clinic. (She's head of the Food Animal Medicine and Surgery section of the University of Illinois veterinary teaching hospital - that's her in the top middle picture at the bottom of the page.) I implemented each file's worth of tests before we created the next file.

An interesting thing happened. When I got to the fifth file (1D), I had to do a lot of backtracking. One key class emerged, sucking code out from a couple of other classes. I think a class disappeared. After cruising through the first four files, it felt like I'd hit a wall. I'd made some bad decisions with the second file (1A), stuck with them too long, and was only forced to undo them with the fifth file. (Had I been attentive to the small, still voice of conscience in my head, I might have done better. Or maybe not.)

At the trial run, we spent four or five hours implementing. Sadly, only one of the teams finished. They did 1D before 1A. (Their order was 000-1D-1C-1B-1A.) What was interesting was that they thought 1D was uneventful but 1A was where they had to do some serious thinking. I got the feeling that their reaction upon hitting 1A was somehow similar to - though not the same as - my reaction upon hitting 1D. That's interesting.

Here are some choice quotes:

Brian: Am I right in remembering that D was no problem, but that things got interesting at A (which is the opposite of what I observed while taking them in the other order)?

Avi: That's right.

'A' changed some of the "ground rules" that we had been assuming about the system. I think the biggest deal was that, up to that point, all "orders" had been linear transitions from one status to another - from intensive care to normal boarding to dead, for example. Suddenly, there were all different kinds of orders that interacted in complex ways, some of them could be active simultaneously, and they had an effect on far more things than just the daily rate. At this point, both the state of the system and the conditional behavior based on the current state, became complex enough that many more things needed to be modelled as classes that previously had gotten away with being simple data types. It was the first time the code was threatening to become anything like the kind of OO design you would have done if you had sat down and drawn UML diagrams from the start.

Chad: It felt to me like that feeling I get when I'm doing something in Excel and I run into a scenario where pivot tables just aren't cutting it. Suddenly, I need a multi-dimensional view of the data, and I realize that the tool I have isn't going to work. So, it was kind of a flat to multi-dimensional transition.

Since we were intentionally avoiding the creation of new classes or abstractions of any kind (as an experiment), we were facing a rewrite to move further.

Given the fact that our brittle code was starting to take the shape of classes that *wanted* to spring into existence, I wonder how much better the code would have been if we would have done classic test-driven development without the forced stupidity. Unfortunately, it's impossible to conduct a valid experiment to test this without a prohibitively large sample size. Who knows--you may have found an example that will generally cause developers to box themselves into a corner.

If Avi and I could forget the exercise completely, it would be fun to go back and try to do TDD while overly abstracting everything to see if we ran into the same issues.

Another pair had an experience slightly similar to mine. They did 000-1C-1B-1A and then started on 1D. One of them says:

The only discontinuity we felt was at D where we realised we needed to have an enhanced accounting mechanism. The rest of the tests exhibit the expected feeling of tension and then release as we added stuff to the fixture and then refactored it out. D felt different to me because unlike the others (in our ordering) D did two things:

It was a significant increment in requirements above and beyond the simple balance model. It was a larger step from a code complexity level than the others.

It broke an assumption that was woven through the accounting code.

What occurred to me at the time was that this is an example of change that you'd like not to happen in a real system. We didn't finish D but it would have been easy to fix. If that had happened in the last iteration before UAT it would have been a lot scarier.

Interestingly I didn't feel we had made a mistake, we had decided to not look ahead and do the trivialest thing, we had just learnt something new and needed to deal with it.

What do I conclude from this? Well, nothing, except that it's a topic I want to pay attention to. I don't think we'll ever see a convincing experiment, but perhaps through discussion we'll develop some lore about ways to get smoother sequences of tests.

If anyone wants to play with the tests, you can download them all. You'll also want the FIT jar file; it has a fixture I use in the tests. Warning: you will need to ask clarifying questions of your on-site customer with expertise in running a university large animal clinic. Oh, you haven't got one? Mail me.

## Posted at 11:32 in category /mfa [permalink] [top]

Sun, 25 Jan 2004

Code-reading practices

My first event in the Master of Fine Arts in Software trial run was a lecture on code-reading in the style of the literary critic Stanley Fish. His "affective stylistics" has one read a poem (say) word-by-word, asking what each word does for the reader. What expectations does it set up? or overturn? What if that word were omitted or moved elsewhere? (I've written on a similar topic earlier, and I drew my examples from that entry.)

I compared idiomatic Lisp code, "C-like" Lisp code, idiomatic C code, and Lisp-like C code to show how expectations and membership in "interpretive communities" influence readability. In the process, I learned something unexpected.

I presented code like this to Dick Gabriel, expecting he would think it an idiomatic recursive implementation of factorial.

(defun fact(n &optional (so-far 1))
   (if (<= n 1)
        so-far
      (fact (- n 1) (* n so-far)))
Note: because of my presentation's structure, I originally named the function f so as not to give away immediately that it was factorial. I don't think that's germane to this note, so I'm giving it the clearer name here.

He didn't think it was idiomatic, not really. He found it somewhat old-fashioned, preferring an implementation that replaces the optional argument with an internal helper function (introduced by the labels form).

      (defun fact (n)
        (labels ((f (n acc)
                   (if (<= n 1) acc (f (- n 1) (* n acc)))))
            (f n 1)))

Now, I always hated labels. What's the difference between Dick and me? It appears to be reading style. As I understand it from him, truly idiomatic Lisp reading style goes like this:

Look for a key name (fact(n)).
Quickly skip down to the code of maximum density.
```
           (if (<= n 1) acc (f (- n 1) (* n acc)))))
      
```
That's the important code. If that's not clear, find the declarations that clarify it by scanning upward. The most important ones will be nearby.

The labels version of the code fits that. The reading style and writing style are "tuned" to each other. It does not fit my reading style, which is to read linearly through functions (though I do bounce around among functions). So the labels verbiage at the front slows me down. I expect the interior names to be more intention-revealing than they need to be when they're just placeholders to make interesting ideas invokable. Because I don't know the visual cues that say "Pay attention here!", I may do more memorization of facts that turn out to be unimportant.

It's arguable that my reading style is just flat-out worse, but I do think that tuning reading to writing is a more useful way to think about it.

All this may seem small, but it reinforces my idea that attending closely to the act of reading will yield some Aha! moments to improve our practice.

## Posted at 10:41 in category /mfa [permalink] [top]

About Brian Marick

I consult mainly on Agile software development, with a special focus on how testing fits in.

Contact me here: marick@exampler.com.

Syndication

Agile Testing Directions

Introduction
Tests and examples
Technology-facing programmer support
Business-facing team support
Business-facing product critiques
Technology-facing product critiques
Testers on agile projects
Postscript

Permalink to this list

Working your way out of the automated GUI testing tarpit

Permalink to this list

Design-Driven Test-Driven Design

Creating a test
Making it (barely) run
Views and presenters appear
Hooking up the real GUI