Using functional style in a Ruby webapp

Motivation

Consider a Ruby backend that communicates with its frontend via JSON. It sends (and perhaps receives) strings like this:

Let’s suppose it also communicates with a relational database. A simple translation of query results into Ruby looks like this:

(I’m using the Sequel gem to talk to Postgres.)

On the face of it, it seems odd for our code to receive dumb hashes and arrays, laboriously turn them into model objects with rich behavior, fling some messages at them to transform their state, and then convert the resulting object graph back into dumb hashes and arrays. There are strong historical reasons for that choice—see Fowler’s Patterns of Enterprise Application Architecture—but I’m starting to wonder if it’s as clear a default choice as it used to be. Perhaps a functional approach could work well:

  • Functional programs focus on the flow of data through code, rather than on objects with changing state. The former seems more of a match for a typical webapp.

  • It’s common in functional languages to lean toward a few core datatypes—like hashes and arrays—that are operated on by a wealth of functions. We could skip the conversion step into objects. Rather than having to deal with the leaky abstraction of an object-relational mapping layer, we’d embrace the nature of our data.

Seems plausible, I’ve been thinking. However, I’ve never been wildly good at understanding the problems of an approach just by thinking about it. It’s more efficient for me to learn by doing. So I’ve decided to strangle an application whose communication with its database is, um, labored.

I’m going to concentrate on two things:

  • Structuring the code. More than a year of work on Midje has left me still unhappy about the organization of its code, despite my using Kevin Lawrence’s guideline: if you have trouble finding a piece of code, move it to where you first looked. I have some hope that Ruby’s structuring tools (classes, modules, include, etc.) will be useful.

  • Dependencies. As you’ll see, I’ll be writing code with a lot of temporal coupling. Is that and other kinds of coupling dooming me to a deeply intertwingled mess that I can’t change safely or quickly?

This blog post is about where I stand so far, after adding just one new feature.

A path through the app

Critter4us is an app that’s used to reserve teaching animals at the University of Illinois vet school. Reserving animals is like reserving meeting rooms, but with some different business rules. For example, Boombird the horse doesn’t care if students practice bandaging on him every day. However, it would be inhumane to practice giving him injections every day, so that can be done at most twice a week.

The story I’ve been working on is one that makes a copy of an existing reservation but with a new “timeslice” (something like “January 1st through 3d, in the mornings”). Ideally, the same animals will be assigned to the copy, but that’s not always possible. If someone else has already reserved the animal for an overlapping timeslice, a new animal has to be found. Or if a procedure (like giving injections) would be within an animal’s “blackout period”, a new animal has to be found for it. For historical reasons, the reservation is made without the unusable animals and the user is alerted to edit it to add new ones.

Here’s the heart of the code for the feature:

The rest of this post will explain how that works and how it’s in a functional style. When I mention classes, I’ll link to the source.

FullReservation

The object model for the old program starts with a reservation, which contains information like “Who made the reservation?” and “For when?”. It also contains zero or more groups, each of which contains zero or more uses. A use links an animal to a procedure to be performed on it. I defined the class structure first, then mapped it onto a database schema, deliberately deferring any worries about efficiency:

The object-to-relational mapping library (Sequel) let me work with the data in a way that hid (”unflattened”) the table structure:

The existence of three tables is made somewhat implicit.

I’m replacing that old Reservation object with a new FullReservation. In FullReservation, I chose instead to make them explicit:

That notation is awkward to type and it doesn’t lend itself to the Symbol#to_proc hack, so I follow Javascript by allowing dot notation as a pun for key lookup:

Namespacing

I’m supposedly doing this in a functional style, and the very first thing I’ve done is make a class? What’s up with that?

I have two reasons. First, I think having the order of function application flow left to right fits the (Western-language-speaker) perception that time flows from left to right and from top to bottom. That makes this:

… easier to read than this:

The second reason is namespacing. You’ll shortly see that everything is built on top of an immutable, lazy FunctionalHash object. A FullReservation just collects those methods that wouldn’t make sense for anything but a FunctionalHash being treated as a reservation. It’s about avoiding name collisions more than about modeling the world.

Inheritance gives me nested namespaces, something I dearly wish I had in Clojure. For example, there are a variety of functions that apply to FunctionalHashes that represent database tables, but are irrelevant to other ones. That code is contained in FullReservation’s superclass, DBHash.

Actually: not quite. The text of the code is found in three different modules that are included into DBHash. I expect to do a lot of mixing-and-matching to create namespaces for particular DBHash classes and even particular objects (via extend).

Extracting a FunctionalTimeslice from a FullReservation

In the reservations table, there are three columns devoted to “when is the reservation for?” They are :first_date, :last_date, and :time_bits. (The first two are Date objects; the last represents the set {morning, afternoon, evening}. In a proper object-oriented design, you’d expect a FullReservation to contain a Timeslice that in turn contains those three values and some timeslice-specific methods as well. I chose to handle such sub-objects differently. Instead of asking a reservation for its timeslice, you make a timeslice from a reservation using setlike operations.

FunctionalHash has an only method that produces a smaller FunctionalHash containing only the named key-value pairs. So this is a timeslice:

I give the timeslice access to a timeslice-specific namespace by wrapping it in a class:

Adding a new Timeslice to Full Reservation

It’s often said that code without mutable state is easier to reason about. I don’t personally find that as big a deal as other people do, but I’ve gotten used to immutability from my Clojure programming. So FunctionalHash disallows messages like this:

The equivalent of assigning a value to a key is done by merging it and creating a new FunctionalHash. The equivalent of deleting a key is done by making a copy of FunctionalHash without the given key. At the moment, this implementation is grossly space-inefficient. Eventually, I’ll port it over to Simon Harris’s Hamster, which implements structure sharing and other optimization techniques. I’m even thinking I might port his code to C.

Here’s the way to change a reservation’s date:

A few notes:

  • change_within is a way of “merging” into a nested hash. It’s the equivalent of this:

  • Remember that, here, timeslice is a three-key hash, not an object.

  • I’m also removing the id to remind myself that the reservation produced here no longer corresponds to one in the database.

Working with disallowed animals

Any time a reservation is made, it disallows some animals (because they’re now in use) and may disallow some animal/procedure pairs (because of rules about how frequently a procedure can be performed). That information is calculated once and stored in Postgres tables named excluded_because_in_use and excluded_because_of_blackout_period.

The code to look up which animals are in use during a timeslice is factored into three pieces. Inlined, it would look like this:

(Fall turns an array of hashes into an array of FunctionalHashes.)

Because that code doesn’t refer to a reservation at all, it seems reasonable to put it in the FunctionalTimeslice namespace.

The FullReservation can use the list of animal ids to prune out its uses:

That’s all easy enough, but the contract with the user is that she’ll see a list of names of animals that couldn’t be included in the reservation. Getting that list is easy enough, given that we have ready access to the rejected uses. Here’s the code, with changes to the previous version highlighted:

That’s fine, but what do we do with the value named by ___animals_already_in_use___? I’d hate to return it along with the new version of the reservation because its caller would have to look like this:

I’d rather avoid names for intermediate steps in the creation of the reservation copy. I want the various versions to flow anonymously through a chain of functions so that I need only name the original and the final copy. (original and copy would be better names than reservation and new_reservation, it occurs to me, but I’m not going to go back now and change all these gists.)

That suggests slamming the animal list into the next version of the reservation, like this:

That kind of creeps me out, and it exacerbates temporal coupling. Nevertheless, it lets the caller look nice, which might mean something. (If mathematicians can go on about elegance, why can’t I?) I hope my tests will loudly tell me when coupling causes a change to function X to break function Y.

as_saved

The last step of creating the copied reservation is oddly named:

as_saved? This is a stylistic affectation that I’m not sure is a good idea. What I’m trying to imply is that the main thing this function does is create a new FullReservation with a bit of extra data, namely data.id, merged in. (The id needs to be sent off to the front end.) The fact that the id is created by changing persistent state somewhere is just an implementation detail. It could just as well be that every possible reservation always already exists somewhere as a big immutable pool, so as_saved just does a lookup to find the matching id.

(Which, it again occurs to me too late, perhaps makes as_saved a name that, strictly, reveals too much about the implementation.)

(Interestingly, I understand that the human immune system works roughly like the silly implementation above: you’re born with some 10 billion different antibodies and the response to infection (mostly) involves finding the useful ones, not creating new ones that match the foreign agent.)

only and the nature of classes

After the as-it-appears-in-the-database FullReservation is created, the pieces that the frontend code care about are extracted and returned to the controller code, which turns them into JSON:

But something creepy is going on here. What’s the type of the result of only?

How can anyone possibly believe that the two-element hash, containing nothing about the reservation in question (but only about the difference between it and its original) is a FullReservation?

For a time, I considered changing only to produce a FunctionalHash, rather than (as it does) an object of the same class as the receiver of the method. Then I smacked myself and reminded myself that I’m using classes to identify namespaces, not natural kinds. Saying that the result of only “is a” FullReservation would be absurd. But it’s less absurd to say that (1) we started with a hash, (2) the functions in the namespace FullReservation applied to it, (3) we derived a second hash from the first, so (4) it’s probably a good guess that the same namespace will be also useful for the second hash.

That is, it’s all about conservation of work. If I stripped every result of only down to a bare FunctionalHash, I’d sometimes have to add a namespace back. By not stripping it, sure, I may get irrelevant functions in the easily-accessible namespace, but I can just ignore them.

Laziness

I chose the name FullReservation not just because Reservation was already taken. It’s because a FullReservation contains all the values that can possibly be relevant to a reservation. But some HTTP requests only care about the reservation’s id. Some only care about some of the data (like the timeslice). Only a few care about the uses and the groups.

Laziness of the sort implemented in Clojure and Haskell seems a nice match for this. When a FunctionalHash key is assigned a block/lambda, it doesn’t treat that as a value. Rather, the FunctionalHash runs that block to calculate the value when the key is dereferenced. After that, the value is cached (and is immutable, just like any other value).

So consider these two steps from our controller:

We create a new reservation by saving a modified reservation to disk. That gives us a new row in the reservations table, some new rows in the groups table, and some new rows in the uses table. But nothing of the groups or uses is used from then on, so it would be a waste to populate the new_reservation with them. How is that avoided? By creating a FullReservation like this:

The uses and groups and even the row in the :reservations table are only loaded when they’re needed, so it costs little to use a FullReservation for everything. With this structure, I’m trying to gain more control than an object-to-relational mapping library gives me, while still freeing myself from the micromanagement of loading. Time will tell if that works.

(Note: I stash the original id that led to the FullReservation in :starting_id. Part of the motivation was to allow a completely fresh FullReservation to return its id without going to the database at all, and another part was to retain the original id even after later changes made it no longer an index into the reservation contents. This dual purpose makes the code confused, I think.)

(Note: Postgres supports the SQL RETURNING extension. So it’d be relatively easy to fully populate a saved FullReservation. I’ve used RETURNING several times, but always later discarded it for one reason or another.)

The grand conclusion

I’ve always hated end-of-talk or end-of-post summations. So I don’t really have one here, except that this approach feels promising, I want to continue trying it, I’d like to hear your comments (sorry about the antiquated blog software), and I’d especially like to hear what happens if you try out this approach.

3 Responses to “Using functional style in a Ruby webapp”

  1. tomm Says:

    A few comments:

    1. Love the idea of Kls#only as a means of filtering attributes. I’ve done partial implementations of that functionality in the past, but I want to play with that as a convention.

    2. The bolds in the code blocks aren’t escaped and look like system calls (e.g., __only__). My WP foo is weak, so I don’t know how to fix.

    3. Thanks for the tip on Hampster; I had never seen that before. Let me know if you are going to go through with the C port, as that looks like something I may be able to put some time behind.

  2. Exploration Through Example » Blog Archive » TDD Workflow (Sinatra / Haml / jQuery) Part 1 Says:

    […] Because I didn’t see skill at Clojure web development being that important to my near-term future, I later decided to stick with Ruby/Sinatra but experiment with writing new backend code in a functional style, which has already led to some interesting conclusions. […]

  3. Exploration Through Example » Blog Archive » If I were an architect Says:

    […] if we just did what our toolset wanted us to. (So, even though I’m personally fashionably skeptical of the value of object-relational mapping layers, I’d be conservative and use ActiveRecord, […]

Leave a Reply

You must be logged in to post a comment.