Biomime: Coding and Biology: Testing [warning: LONG post]

So, today, Remi (of OpenRain) and I made a presentation to the other developers about what we thought would be the best M.O. would could pick for testing our application. Without going into the "whys" behind this testing sea-change, I thought I would recap what I've learned, with Remi's help.

First, we've re-defined what to test. When our testing suite was at its low point, the answer to that question was "everything". ...And that's what made the tests so brittle: if anyone made any change at all, the tests let them know about it. So, coming up with a clear definition of what gets tested was important to me. The answer I chose was twofold:

Those parts of the code (namely, APIs) that really should never change, and
The functionality that we really need to prove works.

The latter point is made particularly clear by the argument for black-box testing. The most poignant case he makes in that video is to make the assertion that "this really doesn't prove anything", and that was very important to me.

So, with those two core definitions, we decided to take two core approaches:

Unit testing
Full-stack (or nearly-so) functional testing

...That is to say, no view tests: they depend too much on everything else to render properly and are too difficult to test in ways that aren't going to break when you decide to, say, rename a div. Secondly: no controller tests. As Remi put it, controller tests are huge tests, and they prove almost nothing. They are not worth their weight.

I looked into a myriad of tools for writing tests, including shoulda, context, matchy, zebra, and a review of RSpec. In a nutshell:

Shoulda is really cool. The assertion that one line of Rails code should require one line of testing code is... superb. I'd like to work under that assumption, and write our own custom assertions to facilitate this. But shoulda also relies heavily on an application being very much in line with the Rails Way, and our app, because of its highly complex, stand-alone database is not particularly so. I feel much of the power of shoulda would be lost on us. So, rather than learn this system, we should just pass it by. Fortunately, RSpec has made some changes to their system that make shoulda-macro-like custom assertions and "one-liners" much easier.

Context and matchy are also neat, in that they give you much of what RSpec and shoulda give you, with much less baggage. I like that concept! But when I actually installed the code and tried it out, a lot of the things that "just work" in RSpec just... didn't. Rather than wrestle with it for more than a day, I decided these tools are... well... too light for our purposes.

Zebra impressed me the most. I think we'll all be writing tests in this style (example: expect { @my_model.to be_invalid() }) in the near future. ...But at the moment, one needs to define too many of those assertions one's self, and thus I don't think is is really mature enough for us to start using without serious investment in time. ...which we just don't have.

In the end, I decided it was wisest to stick with Rpec. It's got everything one needs to write excellent unit tests... and then some. So, yes, it comes with some overhead and baggage, but on the other hand: the things you need to "just work"... do.

That said, the style of testing we'll be doing needs to change. As I said earlier, we should focus on testing only those parts of the code that really shouldn't change. For example, we expect a taxon_concept to have a common name, which comes from a particular part of the database and defaults to the scientific name when that entry isn't found. These are things we can test, and we can do it without stubbing the tar out of every method that ever gets called to create the end result.

Consequently, we will be hitting the database to run tests. And this makes me a little sad, because I know how slow testing can be when it's dependant on the DB.

To alleviate some of this, we are going to try and stop using fixtures in favor of factory_girl. This has several advantages:

Fewer models than with fixtures. If you need to test a "special" model, you instantiate one with the special feature. It's not there in every other test that doesn't need it. Hopefully, this will cut down on the time it takes to prepare for any given test.
Easier to define (DRYer) than fixtures. There's (nominally) one factories.rb file sitting somewhere, with all of your models defined in a rather succinct syntax. Compare this to the 50+ YML files sitting in a directory. If you tweak the relationship between two models, you're not doing a search-and-replace on several YML files, you're changing one definition in one place.
More robust than mocks. The problem with mocks is that you need to stub each function that gets called on them, and this can be quite expensive (in developer's time) and non-dry, if not done carefully.
More coverage of class behaviours. So, when you call that name() method which bounces all over creation to find your common name, you're flexing the muscles of all the pieces involved to make sure they work. Of course, this "feature" comes at the price of less isolation of code. ...and isolated tests is one of the hallmarks of RSpec. But I think it suits our project better to take the coverage. Plus, our project makes heavy use of find_by_sql, which may otherwise go untested.
Easier to instantiate than mocks. Our "top-level" model, a taxon_concept, relies on around 20 other models to actually work. With RSpec mocking, I had to create each of those mocks and tie them together. The resulting code was very, very ugly. Yes, I probably could have cleaned it up, but I don't think I could have gotten it nearly as succinct as factory_girl's syntax.

Factory_girl is a rockin' module. I think everyone should be familiar with it. (And, yes, I am aware that there are a number of viable alternatives with the same underlying behaviour. But f_g seems most popular and least cluttered.)

...So we still have the problem of proving that the website works. This was a problem with RSpec. Because of its fantastic isolation of testing, one was never really sure if the whole stack was going to behave properly. I spent the vast majority of my time in the past 1 1/2 weeks trying to solve this problem of proof that the damn site actually does what you want. There are plenty of solutions out there, but personally, I found most of them clunky. ...at best.

Enter webrat. This is a package that makes visiting a site as simple as... well... visit(url). And you can click around, fill in forms, and all of those similar things with other, very simple syntax. Example from the current homepage:

  class SignupTest < ActionController::IntegrationTest

    def test_trial_account_sign_up
      visit home_path
      click_link "Sign up"
      fill_in "Email", :with => "good@example.com"
       select "Free account"
      click_button "Register"
    end

  end

...Isn't that slick? ...This makes writing good functional tests a piece of cake. And, while we could easily run these kinds of tests with RSpec, I decided that, because we operate on a user-story-centered style of implementing features, I thought we could also adopt cucumber, which is another really slick wrapper around user stories. Basically, you write tests in plain english, using Given / When / Then blocks, and write some ruby code to match your plain-english assertions and turn them into webrat (or some other) full-stack tests.

I expect some resistance to cucumber. It feels a little... hokey to write tests in plain english, then parse them in ruby... but in practice, I have found the technique to be very readable, very usable, and surprisingly minimal (in terms of the amount of code). Assuming there is sufficient buy-in for this, I actually believe it will turn out to be a really cool, really reliable way to, as I said, prove that the site works.

Of course, all of this is academic as of this afternoon. We'll see how things pan out in the next week or two. Some of these ideas may not be well-conceived, or may turn out to be ill-applied to our particular codebase. I'll keep open-minded about it. But I'm also really excited to at least try and get all of this to fit together nicely.

I actually rather enjoy writing tests, and I think these changes will make tests more fun, more useful, and more productive. We'll see.

Biomime: Coding and Biology

Friday, January 23, 2009

Testing [warning: LONG post]

No comments:

Blog Archive

About Me