Thu, 29 Jan 2004
Order of tests
Suppose you have a set of tests, A through Z. Suppose you had N teams
and had each team implement code that passed the tests, one at a time,
but each team received the tests in a different order. How
different would the final implementations be? Would some orders lead
to less backtracking?
I decided to try a small version of such an experiment at the
Fine Arts in Software trial run. Over Christmas vacation, I
collaborated with my
wife to create five files of FIT tests that begin to describe a
(She's head of the
Food Animal Medicine
and Surgery section of the University of Illinois veterinary
teaching hospital - that's her in the top middle picture at the bottom of
the page.) I implemented each file's worth of tests before we created
the next file.
An interesting thing happened. When I got to the fifth file (1D), I
had to do a lot of backtracking. One key class emerged, sucking code
out from a couple of other classes. I think a class
disappeared. After cruising through the first four files, it felt
like I'd hit a wall. I'd made some bad decisions with the second file (1A),
stuck with them too long, and was only forced to
undo them with the fifth file. (Had I been attentive to the small,
still voice of conscience in my head, I might have done better. Or
At the trial run, we spent four or five hours implementing. Sadly,
only one of the teams finished. They did 1D before 1A. (Their order
was 000-1D-1C-1B-1A.) What was interesting was that they
thought 1D was uneventful but 1A was where they had to do some
serious thinking. I got the feeling that their reaction upon
hitting 1A was somehow similar to - though not the same as - my
reaction upon hitting 1D. That's interesting.
Here are some choice quotes:
Brian: Am I right in remembering that D was no problem, but that
things got interesting at A (which is the opposite of what I observed
while taking them in the other order)?
Avi: That's right.
'A' changed some of the "ground rules" that we had been assuming about
the system. I think the biggest deal was that, up to that point, all
"orders" had been linear transitions from one status to another - from
intensive care to normal boarding to dead, for example. Suddenly,
there were all different kinds of orders that interacted in complex
ways, some of them could be active simultaneously, and they had an
effect on far more things than just the daily rate. At this point,
both the state of the system and the conditional behavior based on the
current state, became complex enough that many more things needed to
be modelled as classes that previously had gotten away with being
simple data types. It was the first time the code was threatening to
become anything like the kind of OO design you would have done if you
had sat down and drawn UML diagrams from the start.
Chad: It felt to me like that feeling I get when I'm doing
something in Excel and I run into a scenario where pivot tables just
aren't cutting it. Suddenly, I need a multi-dimensional view of the data,
and I realize that the tool I have isn't going to work. So, it was kind
of a flat to multi-dimensional transition.
Since we were intentionally avoiding the creation of new classes or
abstractions of any kind (as an experiment), we were facing a rewrite to
Given the fact that our brittle code was starting to take
the shape of classes that *wanted* to spring into existence, I wonder how
much better the code would have been if we would have done classic
test-driven development without the forced stupidity. Unfortunately, it's
impossible to conduct a valid experiment to test this without a
prohibitively large sample size. Who knows--you may have found an example
that will generally cause developers to box themselves into a corner.
If Avi and I could forget the exercise completely, it would be fun to go
back and try to do TDD while overly abstracting everything to see if we
ran into the same issues.
Another pair had an experience slightly similar to mine. They did
000-1C-1B-1A and then
started on 1D. One of them says:
The only discontinuity we felt was at D where we realised we needed to
have an enhanced accounting mechanism. The rest of the tests exhibit
the expected feeling of tension and then release as we added stuff to
the fixture and then refactored it out. D felt different to me because
unlike the others (in our ordering) D did two things:
It was a significant increment in requirements above and
beyond the simple balance model. It was a larger step from a code
complexity level than the others.
It broke an assumption that was woven through the accounting code.
What occurred to me at the time was that this is an example of change
that you'd like not to happen in a real system. We didn't finish D but
it would have been easy to fix. If that had happened in the last
iteration before UAT it would have been a lot scarier.
Interestingly I didn't feel we had made a mistake, we had decided to
not look ahead and do the trivialest thing, we had just learnt
something new and needed to deal with it.
What do I conclude from this? Well, nothing, except that it's
a topic I want to pay attention to. I don't think we'll ever see a
convincing experiment, but perhaps through discussion we'll
develop some lore about ways to get smoother
If anyone wants to play with the tests, you can
them all. You'll also want the FIT jar file; it has a fixture I
use in the tests. Warning: you will need to ask clarifying questions
of your on-site customer with expertise in running a university
large animal clinic. Oh, you haven't got one? Mail me.
## Posted at 11:32 in category /mfa
Sun, 25 Jan 2004
My first event in the
Master of Fine Arts in Software trial run was a
lecture on code-reading in the style of the literary critic
"affective stylistics" has one read a poem (say)
word-by-word, asking what each word does for the reader. What
expectations does it set up? or overturn? What if that word were
omitted or moved elsewhere? (I've written on a
similar topic earlier,
and I drew my examples from that entry.)
I compared idiomatic Lisp code, "C-like" Lisp code, idiomatic C
code, and Lisp-like C code to show how expectations and membership in
"interpretive communities" influence readability. In the process, I
learned something unexpected.
I presented code like this to Dick Gabriel, expecting he would think it
an idiomatic recursive implementation of factorial.
(defun fact(n &optional (so-far 1))
(if (<= n 1)
(fact (- n 1) (* n so-far)))
Note: because of my presentation's structure, I originally named
f so as not to give away immediately that
it was factorial. I don't think that's germane to this note, so I'm
giving it the clearer name here.
He didn't think it was idiomatic, not really. He found it somewhat old-fashioned,
preferring an implementation that replaces the optional argument
with an internal helper function (introduced by
(defun fact (n)
(labels ((f (n acc)
(if (<= n 1) acc (f (- n 1) (* n acc)))))
(f n 1)))
Now, I always hated
labels. What's the difference
between Dick and me? It appears to be reading style. As I
understand it from him, truly idiomatic Lisp reading style goes
Look for a key name (
Quickly skip down to the code of maximum density.
(if (<= n 1) acc (f (- n 1) (* n acc)))))
That's the important code. If that's not clear, find the
declarations that clarify it by scanning upward. The most important
ones will be nearby.
labels version of the code fits that. The reading style and
writing style are "tuned" to each other.
It does not fit my reading
style, which is to read linearly through functions (though I do
bounce around among functions). So the labels verbiage at the front
slows me down. I expect the interior names to be more
than they need to be when they're just placeholders to make
interesting ideas invokable. Because I don't know the visual cues
that say "Pay attention here!", I may do more memorization of facts that
turn out to be unimportant.
It's arguable that my reading style is just flat-out worse, but I
do think that tuning reading to writing is a more useful way to
think about it.
All this may seem small, but it reinforces my idea that attending
closely to the act of reading will yield some Aha! moments to
improve our practice.
## Posted at 10:41 in category /mfa