Petter Måhlén

Just a programmer


Qualities of Quality

[This is a slightly modified cross-post of something I wrote for the internal Spotify blog.]

I’m currently on parental leave, which is something that leaves very little time for any concentrated work effort because your first priority is to be on-call to solve the problems of a baby and you get interrupted all the time. But in between interruptions you can reflect on things and sometimes respond to email threads. I’ve been thinking about one topic in particular, namely quality, over the last week or two, and now my son is asleep, so I’m going to try to write up a blog post about it. In this post, I’m making the claim that

Developer-facing quality is a completely different thing from end-user facing quality, and is usually more important.

In one of the email threads I’ve been responding to at work, I said that “near-perfect quality code is … a meta-feature” meaning it affects and improves all other features, and that “it’s really a requirement to achieve sustainable speed”. I now think I can express that more precisely by considering different kinds of quality. The kind of quality I believe most people think about is what the end user experiences: product quality. Bugs, UI inconsistencies, etc. Product quality is a (non-functional) feature like any other, and can rightly be prioritised relative to other product features, such as performance, an improved design, a better recommendation algorithm and so on. The kind of quality that is a meta-feature is the quality a developer experiences, what I would term implementation quality. Things like readability and understandability of the code, ease of re-use, and bug-free-ness. Implementation quality doesn’t affect the end user experience, but it impacts the productivity of the teams working on improving the end user experience. These two kinds of quality overlap but are not the same:

Kinds of Quality

While a product can be compelling even if it has poor product quality and therefore a tradeoff between product quality and other product features is meaningful, it’s much harder to motivate not paying attention to implementation quality. Poor implementation quality kills your ability to add features and evolve your product and therefore makes poor product quality a much more serious problem. Poor implementation quality becomes an obstacle to changing your product, and changing the product quickly is key to getting it right.

A very closely related thought is Martin Fowler’s Design Stamina Hypothesis, which is an article you really should read and understand. So I’m not going to summarise it here, just go ahead and read it. Seriously, read it. Done? OK, now spend 4.43 minutes watching Ward Cunningham explaining (technical) debt. Even if you’ve watched that presentation before, it’ll be ~5 minutes very well spent.

Cunningham says that it can be a good idea to take on debt if you’re saying “I don’t understand this domain well enough to know that I’m building the right features or that I’m using the right abstractions for the features I’m building”. The first kind of debt is what Eric Ries is addressing in the lean startup movement, and what we’re addressing at Spotify with the Think It, Build It, Ship It, Tweak It mantra. The second kind is something you should take on and need to continually pay back as you get the required understanding of your domain. It’s not OK to take on debt by not following good engineering practices and, for instance, not cleaning up your code once you’ve made it work. It should be easy to understand the abstractions you’ve chosen, even if they’re not the right ones. Paying back debt should be mostly about fixing your design as you better understand what it should have been, not about fixing bugs or spaghetti code.

The two most important points Fowler makes, I think, are the narrowness of the time interval during which it is meaningful to trade off design/implementation quality for speed, and that once you’re past the point where the two curves intersect, continuing to disregard implementation quality just slows you down further. You can only profitably trade off implementation quality for speed in very short-lived projects. Probably, if you’re expecting a system to have a lifespan of more than a couple of weeks, it’s a good idea to pay attention to implementation quality right from the start. I think there may be exceptions, when you desperately need to get some feature out to survive. But most places are not struggling to survive from day to day, they’re doing things like figuring out how to build the best music streaming service in history. That’s hard, so we need to make sure that we can adapt our product quickly and easily as we learn how to do it.

In the figure above, I included simplicity as an aspect of both product and implementation quality. This is primarily due to a couple of thoughts: first, an article by Andres Kutt, where especially the section on what he calls functional architecture is relevant. The point he is making is that due to the huge number of features in the Skype web store, it ended up in a state where it was almost impossible to make changes to it. Feature count makes users confused (hence it’s a product quality issue) and it adds code complexity, making the code base less amenable to change. It’s a mistake to think that a feature is free just because you’re not doing active development on it.

A couple of notes on bugs. I include that in both product and implementation quality. The product perspective is pretty obvious – bugs detract from the user experience – but the implementation aspects may be a little less apparent. The first aspect is of course that it’s easier to build something on top of a solid component than a library. So if services and libraries are bug free, it’s easier to make sure that the end user experience is good. But there’s a second aspect as well. Bugs reduce productivity in many ways – for a longer discussion, see this post. The short of it is that unfixed bugs in your code lead to additional meetings, bug management overhead, duplicate reports of the same bug and context switching. So a lot of the time, the best thing you can do for your own productivity is to just fix pretty much everything that’s ever reported.

One common misconception about quality and its impact on delivery speed is that things like pluggability/extensibility/configurability of some technical solution are quality. Those are things often labelled as over-engineering. To me, over-engineering is engineers adding waste by inventing features that aren’t actually really needed. The notorious 2002 Standish Group report on feature use concluded that 64% of product features are rarely or never used. Considering that features interact in ways that make code less malleable, the best thing you can do for your own and your team’s productivity is to question any feature that goes into the product you’re working on. Especially if you came up with it yourself. At Spotify, product owners get their scope creep tendencies kept in check by Think It, etc., but nobody really checks that we engineers limit the scope of the code we write. Over-engineering is not about creating something with too good implementation quality, it’s creating something with too many features. In nearly two decades of doing professional software development, I don’t think I ever felt a team I worked in overspent seriously on implementation quality, but I’ve definitely many times felt that we’ve wasted hugely on some feature or other.

I’ve also got some opinions about the use of TDD to drive implementation quality, but my son is going to wake up any minute, so that will have to remain a topic for a future blog post. :) For now, a summary of this post in two bullet points – to be able to move fast with product development, you need to:

  • Be ruthless about minimising the feature count, and
  • Always pay very close attention to implementation quality – but feel free to trade off product quality if needed.

, ,


Trickle – asynchronous Java made easier

At Spotify, we’re doing more and more Java (Spotify started out as a Python-mostly shop, but performance requirements are changing that), and we’re doing more and more complex, asynchronous Java stuff. By ‘complex, asynchronous’, I mean things along the veins of:

  1. Call a search engine for a list of albums matching a certain query string.
  2. Call a search engine for a list of tracks matching the same query string.
  3. When the tracks list is available, call a service to find out how many times they  were played.
  4. Combine the results of the three service calls into some data structure and return it.

This is easy to do synchronously, but if you want performance, you don’t want to waste threads on blocking on service calls that take several tens of milliseconds. And if you do it asynchronously, you end up with code like this:

  ListenableFuture<List<Track>> tracks = search.searchForTracks(query);
  final ListenableFuture<List<DecoratedTrack>> decorated =
      new AsyncFunction<List<Track>, List<DecoratedTrack>>() {
        public ListenableFuture<List<DecoratedTrack>> apply(List<Track> tracks) {
          return decorationService.decorate(tracks);
  final ListenableFuture<List<Album>> albums = search.searchForAlbums(query);

  ListenableFuture<List<Object>> allDoneSignal =
    Futures.<Object>allAsList(decorated, albums);
  return Futures.transform(allDoneSignal,
    new Function<List<Object>, SomeResult>() {
      public SomeResult apply(List<Object> dummy) {         
        return new SomeResult(

It’s not exactly clear what’s going on. To me, some of the problems with the above code are:

  1. There’s a lot of noise due to Java syntax; it’s really hard to see which bits of the code do something useful.
  2. There’s a lot of concurrency management trivia in the way of understanding which service calls relate to which. The overall flow of data is very hard to understand.
  3. The pattern of using Futures.allAsList() to get a future that is a signal that ‘all the underlying futures are done’ is non-obvious, adding conceptual weight.
  4. The final transform doesn’t actually transform its inputs. To preserve type safety, we have to reach out to the ‘final’ decoratedTracks and albums futures.
  5. It’s easy for application code to accidentally make the code blocking by doing a ‘get’ on some future that isn’t yet completed, or by forgetting to add a future to a newly added service call to the ‘doneSignal’ list.

We’ve created a project called Trickle to reduce the level of ‘async pain’. With Trickle, the code above could be written like so:

  // constant definitions
  private static final Input<String> QUERY = Input.named("query");

  // one-time setup code, like a constructor or something
  private void wireUpGraph() {
    Func1<String, List<Track>> searchTracksFunc = new Func1<String, List<Track>>() {
      public ListenableFuture<List<Track>> run(String query) {
        return search.searchForTracks(query);
    Func1<List<Track>, List<DecoratedTrack>> decorateTracksFunc =
      new Func1<List<Track>, List<DecoratedTrack>>() {
        public ListenableFuture<List<DecoratedTrack>> run(List<Track> tracks) {
          return decorationService.decorate(tracks);
    Func1<String, List<Album>> searchAlbumsFunc = new Func1<String, List<Album>>() {
      public ListenableFuture<List<Album>> run(String query) {
        return search.searchForAlbums(query);
    Func2<List<DecoratedTrack>, List<Album>, SomeResult> combine =
      new Func2<List<DecoratedTrack>, List<Album>, SomeResult>() {
        public ListenableFuture run(List<DecoratedTrack> decorated, List<Album> albums) {
          return Futures.immediateFuture(new SomeResult(decorated, albums));

    Graph<List<Track>> searchTracks =;
    Graph<List<DecoratedTrack>> decorateTracks =;
    Graph<List<Album>> albums =;
    this.searchGraph =, albums);

  // actual invocation method  
  public ListenableFuture search(String query) {
    return this.searchGraph.bind(QUERY, query).run();

The code is not shorter, but there are some interesting differences:

  • The dependencies between different calls are shown much more clearly.
  • The individual steps are more clearly separated out, making them more easily testable.
  • Each step (node in Trickle lingo) is only invoked when the previous ones are completed, so you never get a Future as an input to business logic. This makes it very hard for application code to block concurrency.
  • It’s forward-compatible with lambdas, meaning it’ll look a lot nicer with Java 8.

For more examples and information about how it works, take a look at the README and wiki, because this post is about why, not how, I think you should use Trickle.

Why Trickle?

A danger that you always run as an engineer is that you fall in love with your own ideas and pursue something because you invented it rather than because it’s actually a good thing. Some of us had created systems for managing asynchronous call graphs at previous jobs, and while they had very large limitations (at least the ones I created did), it was clear that they made things quite a lot easier in the contexts where they were used. But even with that experience, we were not sure Trickle was a good idea. So once we had an API that felt like it was worth trying out, we sent out an email to the backend guild at Spotify, asking for volunteers to compare it with the two best similar frameworks we had been able to find on the interwebs (we ruled out Disruptor and Akka because we felt that introducing them would be an unreasonably large change to our existing ecosystem). The comparison was done by implementing a particular call graph, and we then asked people to fill out a questionnaire measuring a) how easy it was to get started, b) how much the framework got out of the way allowing you to focus on the core business logic, and c) how clean the resulting code was. Nine people took the survey, and the results were pretty interesting (1 is worst, 5 is best):

Technology Getting going Focus on core Cleanness
ListenableFutures 4.0 3.6 2.7
RxJava 2.8 3.7 3.1
Trickle 3.9 3.8 4.4

The most common comment regarding ListenableFutures (5/9 said this) was: “I already knew it, so it was of course easy to get started”. The most common comment about Trickle (6/9) was “no documentation” – three of the people who said that also said “but it was still very easy to get going”. So Trickle, without documentation, was almost as easy to get going with as raw Futures that most of the people already knew, and it was a clear winner in code cleanness. Given that we considered the cleanness of the resulting code to be the most important criterion, it felt like we were onto something.

Since getting that feedback, we’ve iterated a couple of times more on the API, and to ensure that it is production quality, we’re using it in the service discovery system that our group owns (and which can bring down almost the entire Spotify backend if it fails). We’ve also added some documentation, but not too much – we want to make sure that the API is simple enough that you can use it without needing to read any documentation at all. As you can tell from the links above, we also open-sourced it. Finally, we did a micro benchmark to ensure we hadn’t done anything that would introduce crazy performance limitations. All micro benchmarks are liars, but the results look like we’re in a reasonable zone compared to the others:

Benchmark Mode Samples Mean Mean error Units
c.s.t.Benchmark.benchmarkExecutorGuava thrpt 5 68.778 4.066 ops/ms
c.s.t.Benchmark.benchmarkExecutorRx thrpt 5 20.242 0.710 ops/ms
c.s.t.Benchmark.benchmarkExecutorTrickle thrpt 5 52.148 1.776 ops/ms
c.s.t.Benchmark.benchmarkImmediateGuava thrpt 5 890.375 79.594 ops/ms
c.s.t.Benchmark.benchmarkImmediateRx thrpt 5 312.870 8.643 ops/ms
c.s.t.Benchmark.benchmarkImmediateTrickle thrpt 5 168.820 13.991 ops/ms

Trickle is significantly slower than plain ListenableFutures (especially when the futures aren’t real futures because the result is immediately available; this case is very fast in Guava, whereas Trickle doesn’t do any optimisations for that). This is not a surprise, since Trickle is built on top of ListenableFutures and does more stuff. The key result we wanted out of this was that it shouldn’t be orders of magnitude slower than plain ListenableFutures, and it’s not. If a single thread is capable of doing 52k operations/second in Trickle, that’s more than our use case requires, so at least this test didn’t indicate that we had done something very wrong. I’m skeptical about the RxJava performance results; it’s slowness when using real threads may well be due to some mistake I did when writing the RxJava code.

Trickle, while still immature in particular regarding the exact details of the API, is now at a stage where we feel that it’s ready for general use, so if you’re using Futures.transform() and Futures.allAsList(), chances are great that you can make your code easier to read and work with by using Trickle instead. Or, if you’re doing things synchronously and wish that you could improve the performance of your services, perhaps Trickle could help you get a performance boost!

, , ,


Keeping Classes Simple using the SRP

The Single Responsibility Principle – that a class should have one and only one reason to change – is, I think, one of the most important principles to follow in order to make high quality and easily maintainable software. It’s also one of the principles that I see abused the most often (by myself as well as others). This post is cross-post of something I wrote for the internal Spotify blog, and talks about a technique to make it easier to adhere to the SRP and thus create simple classes, in the Rich Hickey sense: ‘easy’ is a subjective term describing the effort it takes some particular person to do something, ‘simple’ is an objective term describing the resulting artefact. Simple software is valuable (and complex software is costly and causes frustration), so it is very useful to have good tools that make it easy to create simple things.

I’ll describe a concrete technique that I think of as composing behaviours; I think it’s close to if not identical to the decorator pattern. I’ll be using some code I recently wrote in a library wrapping DNS lookups at Spotify. I needed to create something that would let us do SRV lookups, and if the results were empty or there was another kind of failure when doing the lookup, it should retain the previous data. This is to reduce the risk of service outages caused only by DNS failures. I also wanted to ensure that we had metrics on the time we spent on DNS lookups, the rate of failures, and how common empty responses were.

The most common reaction when faced with a problem description like that is to create a class, (DnsSrvResolver, let’s say) that implements those features. In a sense, this class does a single thing – DNS SRV lookups that are metered and whose results can be kept in case of problems. But I would argue that that ‘thing’ is too big, and it’s better to create multiple classes doing smaller things. Here’s how.

First, an interface (the code in the post is Java, but of course the technique can be applied to code in any programming language):

public interface DnsSrvResolver {
  List resolve(String fqdn);

Then, an implementation class that does the actual lookup of SRV records using a standard DNS library:

class XBillDnsSrvResolver implements DnsSrvResolver {
  private final LookupFactory lookupFactory;

  XBillDnsSrvResolver(LookupFactory lookupFactory) {
    this.lookupFactory = lookupFactory;

  public List resolve(final String fqdn) {
    Lookup lookup = lookupFactory.forName(fqdn);
    Record[] queryResult =;

    if (lookup.getResult() != Lookup.SUCCESSFUL) {
      throw new DnsException(
        String.format("Lookup of '%s' failed with code: %d - %s ",
           fqdn, lookup.getResult(), lookup.getErrorString()));

    return toHostAndPorts(queryResult);

  private List toHostAndPorts(Record[] queryResult) {
    ImmutableList.Builder builder =

    if (queryResult != null) {
      for (Record record: queryResult) {
        if (record instanceof SRVRecord) {
          SRVRecord srvRecord = (SRVRecord) record;


So far, nothing out of the ordinary. Here comes the trick: now, add the metrics-tracking and result-storing behaviours in new classes that delegate the actual DNS lookup to some other, unknown class. Metrics can be done like this:

class MeteredDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Timer timer;
  private final Counter failureCounter;
  private final Counter emptyCounter;

  MeteredDnsSrvResolver(DnsSrvResolver delegate,
                        Timer timer,
                        Counter failureCounter,
                        Counter emptyCounter) {
    this.delegate = delegate;
    this.timer = timer;
    this.failureCounter = failureCounter;
    this.emptyCounter = emptyCounter;

  public List<HostAndPort> resolve(String fqdn) {
    final TimerContext context = timer.time();
    boolean success = false;

    try {
      List<HostAndPort> result = delegate.resolve(fqdn);
      if (result.isEmpty()) {;
      success = true;
      return result;
    finally {
      if (!success) {;

And retaining old data like this:

class RetainingDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Map<String, List<HostAndPort>> cache;

  RetainingDnsSrvResolver(DnsSrvResolver delegate) {
    this.delegate = delegate;
    cache = new ConcurrentHashMap<String, List<HostAndPort>>();

  public List<HostAndPort> resolve(final String fqdn) {
    try {
      List<HostAndPort> nodes = delegate.resolve(fqdn);

      if (nodes.isEmpty()) {
        nodes = firstNonNull(cache.get(fqdn), nodes);
      else {
        cache.put(fqdn, nodes);

      return nodes;
    catch (Exception e) {
      if (cache.containsKey(fqdn)) {
        return cache.get(fqdn);

      throw Throwables.propagate(e);

Note how small and simple the classes are. That makes them very easy to test – only the top level one requires complex setup, due to the nature of the dnsjava library. The others are easily tested like so:

  public void setUp() throws Exception {
    delegate = mock(DnsSrvResolver.class);

    resolver = new RetainingDnsSrvResolver(delegate);

    nodes1 = nodes("noden1", "noden2");
    nodes2 = nodes("noden3", "noden5", "somethingelse");

  public void shouldReturnResultsFromDelegate() throws Exception {

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));

  public void shouldReturnFreshResultsFromDelegate() throws Exception {


    assertThat(resolver.resolve(FQDN), equalTo(nodes2));

  public void shouldRetainDataIfNewResultEmpty() throws Exception {


    assertThat(resolver.resolve(FQDN), equalTo(nodes1));

  public void shouldRetainDataOnFailure() throws Exception {
      .thenThrow(new DnsException("expected"));


    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  // etc..

The fact that the functionality is implemented using several different classes is an implementation detail that shouldn’t leak through to users of the library. (You may have noticed that the implementation classes are package-private.) So to make it easy for clients, I like to provide a simple fluent API, something like this perhaps:

    DnsSrvResolver resolver =

In summary, the advantages of splitting up behaviours into different classes are:

  1. Each class becomes really simple, which makes it easy to test them exhaustively. Some of the code above does non-trivial things, but testing it is still easy.
  2. Since each behaviour lives in a single place only, it’s trivial to, for instance, change the way metrics are recorded, or to replace the underlying DNS library without affecting the rest of the chain.
  3. Each class becomes more understandable because it is so small. Since behaviours are loosely coupled, the amount of stuff you need to keep ‘in working memory’ to work with a class is minimised.
  4. Varying behaviour in different situations becomes easier as it becomes a matter of composing different classes rather than having conditional logic in a single class. Basically, only the builder implementation in the API example above needs to worry about the value of ‘metered’. The DnsSrvResolver implementation doesn’t need to do a conditional check on ‘if (metered)’.

Doing this is not completely free, though:

  1. With more classes, you get more code to search through if you want to find out how, for instance, data is retained. It may be hard to understand the more dynamic run-time object graph that is behind the DnsSrvResolver you’re using. If the whole picture is there statically in code, it’s easier to find it.
  2. Instead of having complexity within a class, you have (more) complex interactions between instances of different classes. The complexity doesn’t go away entirely, it is moved.

Things that should get you thinking about whether you should split out behaviours into different classes include having too many (> 3-4 or so) collaborators injected into the constructor, or using the word ‘and’ when describing the ‘single’ responsibility of a class. Describing a class as “doing SRV lookups and retaining data and collecting statistics” is a giveaway that you don’t actually think of it as doing one thing.

In my experience, the benefits of using this technique very often outweigh the costs. It is easier to make stable software from a number of small, simple, composable building blocks than to do it using larger, more complex building blocks. Dealing with complexity in how things are put together is easier than trying to compose complex things. Writing code like this doesn’t take longer than writing it in a single class – the only added difficulty is in learning to see when there might be a need to split things out in order to make them simpler.


1 Comment

Fluent APIs in Java

In our inventory management system at Shopzilla we’ve got a reporting tool that we’ve built using Wicket. A couple of months ago (this post has languished in draft status for a while), I was going to add a new page and took a look at a previous one to see what I could copy/paste. What I found was this wall of text for creating a table:

List<IColumn<AggregatedFeedRunSummary, String>> columns = Lists.newArrayList();
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Name"), "merchant.merchantName", "merchant.merchantName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Merchant Id"), "merchant.merchantId", "merchant.merchantId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Feed Id"), "feed.feedId", "feed.feedId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Country"), "merchant.countryCode", "merchant.countryCode"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Account Manager"), "merchant.accountManagerName", "merchant.accountManagerName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Tier"), "merchant.keyStatus", "merchant.keyStatus"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Revenue Status"), "merchant.revStatus", "merchant.revStatus"));
 columns.add(new IndexablePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Indexable"), "feed.indexable"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Proteus Last Run"), PROTEUS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Feed Changed"), PROTEUS, "feed_changed"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("FQS Last Run"), FQS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Character encoding"), FQS, "encodingUsed"));
 columns.add(new CharacterEncodingSourceColumn(new Model("CE Source"), FQS));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total Raw Offers"), FQS, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Invalid"), FQS, "invalid"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Duplicate"), FQS, "duplicate"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Valid"), FQS, "valid"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("DI Last Run"), DI, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total"), DI, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Created"), DI, "created"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Updated"), DI, "updated"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Deleted"), DI, "deleted"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Unchanged"), DI, "unchanged"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Resent"), DI, "resent"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Failed"), DI, "Failed"));

 columns.add(new DatePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Last DP Snapshot"), "lastSnapshotDate"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Viable"), "viable", "viable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Non-viable"), "nonViable", "nonViable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Deleted"), "deleted", "deleted"));

As an aside, calling it a wall of text should in no way be interpreted as criticism of the people who wrote it – on the contrary, I think creating the wall of text was a great engineering decision. More on that later.

The code above is a clear example of what a lot of people really don’t like about Java (me included, though I do like Java as a language overall). The signal-to-noise ratio is horrible, mostly due to all the type parameters. There’s stuff being done to the language to help reduce that, using diamond notation and things, but I think in this case, the problem is not really the language. Saying Java sucks on the basis of the code above is a case of blaming the tools for a mistake that we made ourselves. Here are some things that are wrong with the code above:

  1. First, there’s a lot of stuff that’s not really relevant. We don’t care about the exact data types that are used as column entries in the Wicket view, but we do care about what the contents of the column are and how they are displayed to the user. So “new AggregatedFeedRunPropertyColumn(new Model(“, etc, is mostly noise except insofar as it tells us where to look for the information we’re after.
  2. A very similar point is that a lot of the information that is there is not very clear. The fact that the type of the column is a ‘PropertyColumn<AggregatedFeedSummary, String> doesn’t convey much in the way of information unless you’re very familiar with Wicket and the specific data types that we’re using.
  3. There’s also missing information: as an example, the string “merchant.merchantName” is repeated twice, and it’s not clear why. There is in fact a reason: the first time, it indicates which property value to display in a particular table cell, and the second time, it indicates which property value to use when sorting based on this column.

Using a fluent API, we can express the code above like so:

return FrvColumns.columnsFor(AggregatedFeedRunSummary.class, String.class)
 .add("Merchant Id").sourceProperty("merchant.merchantId").sortable("merchant.merchantId")
 .add("Feed Id").sourceProperty("feed.feedId").sortable("feed.feedId")
 .add("Account Manager").sourceProperty("merchant.accountManagerName").sortable("merchant.accountManagerName")
 .add("Revenue Status").sourceProperty("merchant.revStatus").sortable("merchant.revStatus")

 .add("Proteus Last Run").sourceProperty("properties.proteus.start_time").epoch()
 .add("Feed Changed").sourceProperty("properties.proteus.feed_changed")

 .add("FQS Last Run").sourceProperty("properties.fqs.start_time").epoch()
 .add("Character encoding").sourceProperty("encodingInfo.encoding")
 .add("CE Source").sourceProperty("encodingInfo.source")
 .add("Total Raw Offers").sourceProperty("")

 .add("DI Last Run").sourceProperty("properties.di.start_time").epoch()

 .add("Last DP Snapshot").sourceProperty("lastSnapshotDate").date()

The key improvements, in my opinion, are:

  • Less text. It’s not exactly super-concise, and there’s a lot of repetition of long words like ‘sourceProperty’, and so on. But there’s a lot less text, meaning a lot less noise. And pretty much everything that’s there carries relevant information.
  • More clarity in what the various strings represent. I think in both the original and updated versions, most people would guess that the first string is the title of the column. The fluent API version is much clearer about why some of the columns have the same property string repeated – it’s because it indicates that the list should be sortable by that column and which property of the list items should be used for sorting.
  • Easier modification – rather than having to figure out what class to use to display a correctly formatted Date or epoch-based timestamp, you just tack on the right transformation method (.date() or .epoch()). Since this is available at your fingertips via command-space, or whatever your favourite IDE uses for completion, it’s very easy to find out what your options are.
  • More transparency. The primary example is the AggregatedFeedRunPropertyColumn, which behind the scenes looks for property values in ‘properties.%s.%s’, and adds ‘component=%s’ as a parameter if the thing is a link. This behaviour is visible if you look in the class, but not if you look at the configuration of the columns. With the fluent API, both those behaviours are explicit.

So, in this case a fluent API both reduces noise and adds signal that was missing before. The cost, of course, is that it takes some effort to implement the fluent API: that’s why I think it was a good engineering decision not to do so immediately. The investment in the API will pay off only if there’s enough volume in terms of pages and column definitions that can use it. What’s more, it’s not just hard to know whether or not you should invest in a fluent API up front, it won’t be clear what the API should be until you have a certain volume of pages that use it.

Other big factors that determine whether an investment in a clearer API is going to pay off are the rate of change of the system in question and whether it’s a core component that everyone working on will know well or a more peripheral one that many developers can be expected to touch with only a vague understanding of how it works. In this case, it’s the latter – reporting is the sort of thing that you need to change often to reflect changes in business logic. Also, since you need to report on all features in the system, it’s normally the case that people will need to be able to modify the report views without being experts in how the tool works. That means that ease of understanding is crucial, and that in turn means it’s crucial that it be obvious for people how to make changes so as to avoid having everyone come up with their own solution for the same type of problem.

The standard pattern I tend to use when building a fluent API is to have a Builder class, or in more complex cases like above, a set of them. So in the example above, this is how it starts:

public static <T, S> ColumnsBuilder<T, S> columnsFor(Class rowClass, Class<S> sortClass) {
   return new ColumnsBuilder<T, S>();

public static class ColumnsBuilder<T, S> implements IClusterable {
   private final List<SingleColumnBuilder<T, S>> builders = Lists.newArrayList();

   public SingleColumnBuilder<T, S> addColumn(IColumn<T, S> column) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, column);
      return columnBuilder;

   public SingleColumnBuilder<T, S> add(String label) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, label);


      return columnBuilder;

   public List<IColumn<T, S>> build() {
      return ImmutableList.copyOf(
        Lists.transform(builders, new Function<SingleColumnBuilder<T, S>, IColumn<T, S>>() {
           public IColumn<T, S> apply(SingleColumnBuilder<T, S> input) {
              return input.buildColumn();

In this case, the builder hierarchy matches the resulting data structure: The ColumnsBuilder knows how to add a new column to the table (TableBuilder might have been a better name), the SingleColumnBuilder knows how to build a single column(!), and then further down the hiearchy, there are LinkBuilders, ParamBuilders, etc.

I sometimes let a single Builder class implement multiple interfaces, if I want to enforce a particular ‘grammar’ to the API. So, for instance, if I want code to look like this:

 ActionChain actions = Actions.first().jump().onto(platform).then().climb().onto(couch).build();

I might go with something like this:

public interface NeedsVerb {
  public NeedsTarget jump();
  public NeedsTarget climb();

public interface NeedsTarget {
  public CanBeBuilt onto(Target target);

public interface CanBeBuilt {
  public NeedsVerb then();
  public ActionChain build();

public class Actions {
  public NeedsVerb first() {
    return new Builder();

  static class Builder implements NeedsVerb, NeedsTarget, CanBeBuilt {
    private final List actions = new LinkedList();
    private Verb verb = null;

    public NeedsTarget jump() {
       verb = Verb.JUMP;
       return this;

    // .. same for climb()

    public CanBeBuilt onto(Target target) {
       actions.add(new Action(verb, target));
       return this;

    public NeedsVerb then() {
       verb = null; // whatever, this is just an example ;)
       return this;

    public ActionChain build() {
       return new ActionChain(actions);

Some points to note:

  • I usually don’t bother too much with things like immutability and thread-safety in the Builders. They’re “internal”, and generally only used in single-threaded, initialisation-type contexts. I would worry very much about those things in the Action and ActionChain classes.
  • I usually spend more time on consistency checks and error-reporting than in the example above. The point of building a fluent API is usually to make life easier for somebody who is not an expert (and shouldn’t have to be) in the internal workings of the exposed functionality. So making it very easy to set things up correctly and to understand what is wrong when things have not been set up correctly are key.

In summary, fluent APIs are, in my opinion, often a great choice to improve code readability – and this is of course language independent, though particularly noticeable in a verbose language like Java. The main drawback is the up-front implementation overhead, but since developers spend much more time trying to understand old code than write new, that implementation cost is often insignificant compared to the improvement in reading time. This is particularly true when a) the code calling the API changes very often, b) many developers need to use the API, and c) the code behind the API is outside of developers’ main focus. As always, one of the best sources of great code is Guava, so for some first-class fluent API implementations, take a look at for instance FluentIterable and CacheBuilder. I think fluent APIs are underused in ‘normal business logic’; they don’t need to be solely the domain of snazzy library developers.

, ,

Leave a comment

VoltDB Evaluation Notes

I’m currently working on an interesting project that has some, from my perspective, pretty large-scale storage requirements – hundreds of millions of entities, around 1TB of storage volume, and needing to concurrently support thousands of writes and tens of thousands of reads per second. Reads are fine, but the number of writes is beyond what a conventional RDBMS can comfortably deal with. Some people are apparently able to do ~1000-1500 writes/second or even more, but in our current ecosystem, the biggest number we’ve been able to get to is about 400/second. Plus writes tend to get slower as table sizes grow, and we’re talking hundreds of millions of entries. So in the beginning of the project, we were looking at alternatives.

Our first stop, of course, was NoSQL databases. We checked out a fair few of them over a period of a month or two: HBase, MongoDB, Riak, etc. We spent some time getting comfortable with solutions that would permit us the guarantees we wanted in terms of not losing data or transactions in a world of eventual consistency. We came up with algorithms that we believed would work, but they made us unhappy because they were so complicated. If we could stick to ACID transactions our algorithms would be simpler and our lives would be so much easier. So we started to look for technologies that would help us do that, and found options like Gemfire and various sharding technologies for MySQL. We considered rolling our own sharding solution. And eventually, via a detour at Clustrix, we came across VoltDB, which is the solution we’re currently in production with (for those not familiar with VoltDB and too lazy to click on the link: it’s a clusterable in-memory datastore that optionally provides ACID transactions). This post describes what we found during our evaluation and what we’ve found in the months since when we’ve been using it in development and production.

The Good

I’ll admit it; I quickly fell in love with VoltDB. The first thing that caught my attention was an aspect of the architecture. Based on research involving the founder Michael Stonebraker, they concluded that traditional RDBMSes spend (or waste) most of their time on things that are due to disk IO and concurrency. So they removed disk IO and concurrency from the critical path. There’s obviously still disk IO in VoltDB, and they deal with a lot of aspects of concurrency. But the IO that gives durability happens before the transactions and is done in a way that is append-only rather than random-seek, which makes it way faster than mid-transaction IO (this is what Martin Fowler calls a memory image). And since the transactions themselves require no disk IO, they execute quickly, meaning that serialising them simplifies the execution logic without incurring a significant penalty due to lack of concurrency. It’s a solution that feels fundamentally right to me.

The second really good thing is the management interface/s. By that I mean that any administrative task I have done has felt simple and intuitive, whether it was done through their management tool (the one that comes with the Enterprise Edition) or through the command line. So installing, upgrading, setting up the domain model, monitoring the data store, everything has been possible to do with very little need to refer to documentation. That promises a low cost of operation and shows good understanding of how to engineer a data store that works in practical use.

Third, another thing the people at VoltDB have got right is developer feedback. The best example of how not to do error messages is Oracle, which excels in giving cryptic error messages without sufficient detail to understand where you went wrong. VoltDB, in contrast, will tell you as early as possible what you’ve done wrong. So, for instance, if your stored procedure tries to access a table that doesn’t exist, you get a compile-time error with all the information you need:

[java] 2011-11-25 13:01:24,124 INFO [] [UpsertAtom.class]: Compiling Statement: UPDATE atom SET label = ? WHERE atom_id = ?
[java] 2011-11-25 13:01:24,199 INFO [] [UpsertAtom.class]: Compiling Statement: INSERT INTO atom VALUES (?, ?)
[java] 2011-11-25 13:01:24,219 INFO [] [UpsertAtom.class]: Compiling Statement: SELECT * FROM atoms WHERE atom_id = ?
[java] 2011-11-25 13:01:24,219 ERROR [] [UpsertAtom.class]: Failed to plan for statement type(SELECT) SELECT * FROM atoms WHERE atom_id = ? Error: "user lacks privilege or object not found: ATOMS"
[java] 2011-11-25 13:01:24,219 ERROR [] Failed to compile XML
[java] 2011-11-25 13:01:24,219 ERROR [] Failed to plan for statement type(SELECT) SELECT * FROM atoms WHERE atom_id = ? Error: "user lacks privilege or object not found: ATOMS"
[java] 2011-11-25 13:01:24,219 ERROR [] Catalog compilation failed.

Fourth, the performance is amazing. A single-node VoltDB installation on my machine does about 85k writes/second, and the write speed doesn’t decrease noticeably with increasing data sizes. And that’s while the same machine is generating the load. I’m using an iMac with 8GB RAM and an i7 processor (quad core), so it’s not exactly low-end but certainly nowhere near top of the line. On our 3-node evaluation cluster, with a configuration that does synchronous command logging (needed for full durability, which we require) and storing one replica of each object (k-safety = 1), we were doing about 60k writes/second most of the time. Those machines are Dual Quad-Core AMD Opteron 2.1GHz with 32GB of RAM each. The bottleneck is probably the fsyncing to disks not optimised for fast fsync, or possibly the network configuration which is far from ideal – the machines should all be on the same switch, but they are in fact on two different ones, and I’m not even sure how many hops there are between the switches. One thing that slows the average throughput down to much less than 60k/second is the fact that we’re using the same physical disk for taking snapshots as for writing command logs and this means that when snapshots are taken, write performance goes down to about 2k/second:

Snapshotting slowing VoltDB down With this cluster, we’ve been able to insert about 230M rows in mixed tables in about 2.5 hours, which is way faster than I had expected.

Finally, in all our interactions with VoltDB, they’ve been both very pleasant, professional and exceptionally fast. Most of our issues were resolved very quickly – usually within hours, always within days, and I can’t think of a single one taking more than a week. This includes a couple of issues that were weird (in the sense of “hard to troubleshoot”) mistakes entirely on our side.

The Bad

The first thing I noticed that I didn’t like about VoltDB was that they used an abstract class with some core methods being flagged as final as a crucial client class, VoltTableRow. This meant I couldn’t write this code to unit test something I wrote myself:

 // mock and when are static imports from Mockito
 VoltTableRow row = mock(VoltTableRow.class);

 when(row.getString(1)).thenReturn("the atom");

           equalTo(new Atom(new Id<Atom>(1),
                            "the atom")));

In fact, I haven’t yet found a reasonable way of mocking or subclassing VoltTableRow to be able to use it in unit tests. Also, when I looked closer at the source code, I felt it coupled clients a little too closely to implementation details in the VoltDB client code. When I questioned this on the forums, I quickly got a response from VoltDB suggesting that maybe interfaces would have been a better choice, and since I agreed, I forgot about this issue. But it did leave a little aftertaste.

However, I got back to work, and once we had concluded our first bigger test where we inserted the 230M entries, I was back in love with VoltDB. In fact, that successful experiment made me feel we were about done with the evaluation, and that we didn’t need to continue looking at the other technologies we had slotted for a deeper evaluation. And of course, that’s when we hit our first serious problem. The cluster went down about 15 minutes after concluding the test (while I was on a conference call bragging about the results), and on trying to bring it back up, the recovery of the database failed. It turned out that the cluster went down due to incorrect operating system configuration on our part (we had failed to do everything mentioned in their release notes). But the recovery failure was more serious and happened due to a bug (which was fixed within a day or two). In our use case, a data store that goes down occasionally (rarely) is something we can deal with, as long as it doesn’t stay down for long, as long as performance in the general case is good enough, and above all, as long as we don’t lose data or updates. Luckily, after some hard work, VoltDB were able to help us restore the data – it was only test data, so the data wasn’t important for us. But it was important for us to prove that when things go wrong, we don’t lose data.

We also had one earlier incident when the system went down due to a bug that had been fixed in version 2.1.1 (we started out with 2.1), something I didn’t pay much attention to at the time. However, these incidents made me take a deeper look inside some of the core classes in VoltDB. I found that very much of the code looks like it’s been written by expert programmers – only experts in C/C++ as opposed to experts in idiomatic Java. Large classes (2500+ lines in some cases – and of course, the large classes are the key ones), large methods with huge cyclomatic complexity, exceptions being swallowed, examples of inappropriate use of inheritance, and a fair amount of oddity in terms of dealing with concurrency. There was nothing that actually struck me as a bug, but where there’s smoke, there’s usually a fire. And we had indications in the form of a few noticeable bugs even within an evaluation period that had only lasted a couple of weeks.

It’s an interesting aspect of VoltDB’s choice to open-source their product that we know the state of its insides as opposed to just its outsides in the form of APIs, tools and documentation. I guess open-sourcing your code is a double-edged sword: it’s great because it gives people a way to learn your product, find answers and suggest solutions and improvements. But it also means you can’t polish your product surface and pretend to the world that the inside too is shiny. Another thing I hadn’t thought of before is that if you open-source your product, your code becomes part of your advertising. You should make it shine, and this is something that VoltDB can improve. I think improvements there could lead to both better code quality and a product that is an easier sell.

I know I am picky about code and it’s very easy to find faults in any code you read – including your own. What’s more, some of that is a matter of taste rather than an objective truth, and it’s hard to know what you don’t like about some code is down to your taste, and what is actually objectively good or bad. I know for a fact that a lot of the code we have internally looks no prettier than the VoltDB code, and though I don’t like that fact or that code, the code basically works. And as Mik, a colleague reflected, “better the devil you know”. We have no idea what, for instance, the code inside Oracle looks like.

The Verdict

Despite the stability issue we ran into, and despite not quite liking the state of the Java code, we decided to go ahead with VoltDB, for the following reasons:

  1. The great support we’ve been getting from them.
  2. The fact that we can live with the database going down and the system being unavailable for short and rare periods, so if the ‘smoke’ of the Java code leads to fires, we should be able to deal with that.
  3. The fact that we can design our system in such a way that even if we would lose data, we can recreate it based on inputs and data stored in slower, but more reliable stores. This type of recovery is enabled by the blazing speed of VoltDB, which even on suboptimal hardware allows us to insert volumes corresponding roughly to our production requirements in a couple of hours.
  4. The fact that we don’t believe that the other options out there are better. All the technologies that we’ve been looking at as potential candidates are about as immature as products. Either, the internet abounds with reports of them losing data or big companies abandoning them as technologies, or they (like VoltDB) are young and lack significant numbers of public, high-volume clients. Also, our in-house experience of somewhat similar technologies such as HBase and Coherence has been that they far less pleasant working with in terms of ease of use, etc, than VoltDB.
  5. We’ve not yet lost data, and although it is easy to bring a VoltDB cluster or at least node down through what amounts to human error (using a non-deterministic ad-hoc SQL query has been our favourite way of doing that), we’ve not seen any issue that hasn’t been possible to recover from. The scare we had in the early stages actually turned out to be a positive thing in the end. It’s better to have seen evidence of an actual failure that led to a successful recovery than to have no experience of whether a recovery might work or not.

We’ve been in production – for a fraction of our customers – for a couple of months now, and things are working very smoothly. An interesting observation is that the sheer speed of VoltDB has enabled us to simplify our architecture pretty drastically in some places. Where we had anticipated having to do some kind of complicated caching or pre-loading of data, we can simply just read it straight out of the underlying store.  I don’t think we’ve seen the end of that sort of simplification yet. Some of the reflexes you get about protecting your database/s are no longer valid when the data store is capable of doing hundreds of thousands of transactions per second.

Overall, we’re very pleased with our choice of VoltDB for the high transaction volume parts of our storage needs. The product is very well-engineered and easy to work with, and the support we’ve been getting has been fantastic.

, ,


Product Development vs Productivity Development

A couple of years ago, the team I was in started to use a concept we called gradual refactoring, whereby we would allocate a fixed percentage of our capacity to doing refactoring in order to keep technical debt at bay. That worked pretty well, and has been refined over the years into something we now tend to call the Productivity Backlog, in contrast to the normal Product Backlog. The way we use this idea boils down to reserving a more-or-less fixed percentage of overall capacity to improving that capacity. This is typically refactoring or cleaning up old, hard-to-change code or building or improving tools to help with something that slows us down. Normally those tools help us automate some process, making it less error-prone and less labour-intensive.

This works really well. Initially, I wanted to keep productivity efforts as part of the normal backlog, because I wanted a global prioritisation order. “All we need is to get really good at weighing medium to long-term benefits of productivity improvements vs the shorter term benefits from the product improvements”. That has turned out to be even harder in practice than I had thought. It’s just easier to keep things separate, for the following reasons:

  • It’s an easier sell to the non-technical parts of/people in the organisation. “Continuous improvement” is something almost all organisations would like to do, and keeping a Productivity Backlog is an easy way to formalise that at a team level.
  • It means that, as a technical person, you don’t have to try to sell the benefits of intangible things like “refactoring” or “eliminating technical debt” each time you need to untangle something particularly troublesome in your code base.
  • It means that, as a business person, you don’t need to try to understand the reasons why some code is poorly suitable to some task or other, freeing you up to focus more clearly on figuring out what features and changes will make the product more successful in the future.
  • It simplifies prioritisation both on the product and the productivity side, because a) you are comparing apples with apples (or at least fruits with fruits, if you know what I mean) on both backlogs and b) you can have different owners who can structure and order their own backlog without a need to coordinate needs and requirements from the other one.
  • It provides an easy knob to turn as needs change – if you urgently need to get some product feature out, you can easily turn the productivity backlog down to 10% or even 5%. And if you’re suffering acutely from wasted time due to inefficient deployment procedures, you can turn it up to 20-25% until you’ve solved that crisis. If you do things on a case-by-case basis, you have to weigh the benefits of saving 3 minutes each time somebody needs to deploy a particular build of something somewhere against trying out a new front page to improve conversion, something that is hard and requires people with different types of expertise to understand each other’s problems and agree to a compromise.

Some of the issues we’ve had in using this concept have been around figuring out what types of changes should go on it. Initially, we called it the ‘tech backlog’, because it was owned by ‘tech’ and dealt very much with issues in implementation. That led to attempts at adding things like performance problems/measurement, or bug fixes driven more by technology than by the business side of things (like things getting logged that made real problems hard to find even though there was no obvious end-user impact). But those things are product features – quality and performance are clearly features in the product. Productivity is about the team’s ability to deliver more product features over time. That means it can be about fixing bugs, if those bugs, for instance, mean that deployment occasionally fail. But most of the time, it’s about removing obstacles and frustrations that keep us from delivering the product improvements that are what we get our job satisfaction from.

Productivity backlogs, compared to product backlogs, are reactive more than proactive. You feel the pain of some faulty process or code, and when it grows unbearable, you add fixing it near the top of your productivity backlog. There’s very little point in having long term goals for a productivity backlog.

A separate productivity backlog makes life simpler for both the business and the delivery side of things, is a great way to formalise continuous improvement and helps improve job satisfaction in delivery teams partly by removing obstacles to productivity and partly by providing knowledge (or hope at least) that the muck you’re having to wade through at the moment will at some point in the future be cleared out. Everybody should have one!

, ,


BDD using Lettuce vs Cucumber-JVM

I’ve now been working for more than 8 months on a project where we’ve been using BDD and creating high-level tests written in Gherkin before actually going on to implement the features themselves. It’s the first time I’m using BDD, so it’s been very interesting. The main conclusion is that it’s a very useful thing to do – writing a high level test really helps me determine how things should work, and I think it makes the implementation cleaner. I can’t say I have conclusive evidence that we’ve got fewer regression bugs as a result of our BDD tests, but at least we’ve got comprehensive regression/acceptance tests that are fully automated. Another BDD reflection is that I think it’ll take me some more time before I really feel I understand it; I’m still not sure I get things like how best to structure the testing code from the perspective of making tests very robust to changes in features. That may be a topic for a future blog post, if and when I start feeling I have some insight into that.

So, BDD is a good thing, as far as I am concerned. This post is not about that; it’s about our experiences of using two different frameworks to execute the Gherkin tests: a Python-based one called Lettuce and Cucumber-JVM, using Java for the step definitions. We evaluated both during our project initiation, and made the decision to go with Lettuce. The primary reason was that we’re using Python for a lot of our automated testing company-wide, and it was felt that it would be easier to move from project to project if we kept to the same language. Cucumber-JVM wasn’t actually released yet at the time, but it felt stable (and the code is really well written) and was close to a release, so that didn’t factor heavily in the decision. For me, being essentially a Java-only developer, I felt that was an OK decision. I felt I would personally be more productive using Java than Python, and that Cucumber-JVM integrated better with our Maven-centric builds, but that it would be interesting to have a good excuse learn a new language.

Now, a few months down the line, we’ve made the decision to stop developing new tests using Lettuce and instead use Cucumber-JVM for new features. (More precisely, we will keep doing our big end-to-end tests using Lettuce, since we’ve got some really nice infrastructure code written in Python that helps us manage deployments and stuff, but the single-service tests will be Cucumber-JVM from now on).

The reasons for this decision are as outlined below – note that it’s hard for me having done about 20 times more Java development than Python to be completely objective about the languages as languages. I’ve tried to be explicit about those points that are clearly subjective.

  1. In my personal opinion, Python as a language lends itself better to ‘small, quick’ types of problems. Throughout the last few months, I’ve repeatedly been impressed by how often ‘just trying something’ in Python leads to the behaviour I was looking for. Also, since it is interpreted, experimenting can be really quick. However, in the case of running BDD tests, you’re actually not working on a ‘small, quick’ type of problem. Frequently, running the tests takes minutes because they require starting/restarting services, setting up databases, etc. That means that mistakes that the Java compiler/IDE would have told you about as you typed them can take a minute or so to discover, and that’s not great at all. I think dynamic languages require quick turnaround to give you any benefits from the reduced amount of typing you have to do to write code in them, and BDD doesn’t feel like a problem that lends itself to quick turnaround times.
  2. It’s hard to troubleshoot tests that are run via Lettuce. You can’t debug Python, and something (nose or Lettuce itself, I’m not sure) takes over the standard out, which means it’s hard to print out debug output. In our current implementation, we’re getting assertion errors divorced from the debug output, which makes it quite awkward to figure out what was the actual issue.
  3. We’ve had to spend a lot of effort on getting the necessary infrastructure together that allows us to do things like share code between BDD tests for different services, ensure that different environments run the same libraries, that we don’t have to install too many dependencies locally before being able to run a build on a given machine, etc. So we’re using things like pip to download and install dependencies, virtualised buildouts to ensure that we actually have the right versions of stuff without conflicting with the machine-global Python installation, and an internal cheeseshop for distributing our internal test utilities code. This is stuff that basically comes for free and in a better implementation when using Maven. When I say ‘better implementation’, I’m talking particularly about the dependency management, where Maven’s versioning of libraries feels like it is more mature and works better than Python eggs do.
  4. Since it’s possible to run Cucumber-JVM tests via JUnit, they slot directly into all the existing infrastructure around build tools and processes. This means that you can immediately view test results from the BDD tests in the same way as your unit tests, which is very useful (although there are still issues). For our Lettuce tests, on the other hand, we’ve had to hand-craft tools to plug in the test execution and result notification into Maven. And while it should be possible, we haven’t gotten around to actually integrating the output of those tools fully into Jenkins, Sonar, etc. This integration again just comes for free with Cucumber-JVM.
  5. Another advantage of having the tests runnable via JUnit is that it is trivial to run them from inside the IDE, which means you can use a debugger to do troubleshooting, that you can easily select which ones to run, etc.
  6. Finally, although we have no data to confirm this, it feels like running tests with Cucumber-JVM is faster than running via Lettuce. And that’s important as it makes it less painful to run the BDD tests more frequently. (EDIT: see comment below for some actual data to support this point)

For me, as an experienced Java hand and far from a polyglot programmer, it’s been very interesting to spend a few months with another language. I think I can safely summarise most of the advantages of Cucumber-JVM over Lettuce as coming from “Java vs Python as a platform” rather than “Java vs Python as a language” – with some implementation choices thrown in there and the fact that BDD as a problem doesn’t lend itself to quick turnaround, which makes static analysis more valuable.

In general, BDD has been a great experience so far, as has Cucumber-JVM and to a lesser degree Python. But from my perspective, Lettuce isn’t a great tool for writing BDD tests for Maven-based projects.

, , ,


Exceptional iterators using Guava

Some years ago, I came across a little gem of a blog post about using a class called AbstractIterator that I hadn’t heard of before. I immediately loved it, and started using it, and in general, the APIs defined in the JDK collections classes a lot more frequently. I’m still doing that today, but for the first time, I came across a situation where AbstractIterator (nowadays in Guava, not Google Collections, of course) let me down.

In the system I’m currently working on, we need to process tens of thousands of feeds that are generated ‘a little randomly’, often by people without deep technical skills. This means we cannot know for sure that entries in the feeds will be consistent; we may be able to parse the first four entries, but not the fifth, and so on. The quantity of feeds, combined with the fact that we need to be forgiving of less-than-perfect input means that we need to be able to continue ingesting data even if a few entries are bad.

Ideally, we wanted to write code similar to this:

Iterator iterator = feedParser.iterator();

while (iterator.hasNext()) {
  try {
    FeedEntry entry =;

    // ... process the entry
  catch (BadEntryException e) {
    // .. track the error and continue processing

Now, we also wanted to use AbstractIterator as the base class for the parser to use, maybe something like:

class FeedEntryIterator extends AbstractIterator<FeedEntry> {
  protected FeedEntry computeNext() {
     if (noMoreInput()) {
        return endOfData();

     try {
        return parseInput();
     catch (ParseException e) {
        throw new BadEntryException(e);

This, however, doesn’t work for two reasons:

  1. The BadEntryException will be thrown during the execution of hasNext(), since the AbstractIterator class calls the computeNext() method to check if any more data is available, storing the result in an intermediate member field until the next() method is called.
  2. If the computeNext() method throws an exception, the AbstractIterator is left in a state FAILED, from which it cannot recover.

One option to work around this that I thought of was delegation. Something like this works for many scenarios:

public class ForwardingTransformingIterator<F, T> implements Iterator<T> {
    private final Iterator<F> source;
    private final Function<F, T> transformer;

    public ForwardingThrowingIterator(Iterator<F> source, Function<V, T> transformer) {
        this.delegate = delegate;
        this.transformer = transformer;

    public boolean hasNext() {
        return delegate.hasNext();

    public T next() {
        return transformer.apply(;

The transformation function could then safely throw exceptions for entries it fails to parse. There’s even a shorthand for the above code in Guava: Iterators.transform(). The only issue is that the delegate iterator needs to return chunks of data that are the same size as what the parser/transform needs. So if the parser needs to be, for instance, something like a SAX parser, you’re in trouble. When I had gotten this far, I figured the only solution was to modify the AbstractIterator class so that it can deal with exceptions directly. The source is a little too long to include in an already source-heavy post, but there’s a gist here. The essence of the change is to make the base class store exceptions of a particular type that are thrown by computeNext(), and re-throw them instead of returning a value when next() is called. It kind of works (we’re using this implementation right now), but it would be nice to find a better solution.

I’ve created a Guava issue for this, so if you think this would be useful, go ahead and vote for it. And if you have suggestions for how to improve the code, feel free to make them here or at!

, ,

Leave a comment

Hardcode Behaviour!

Some years ago, when I was learning Git, I watched a presentation by Linus Torvalds, and in passing, he made one of those points that just fits with stuff you’ve been thinking but haven’t yet verbalised and so don’t fully understand. He was talking about how he had obsessed over the performance of some operation in Git (merges if I remember right), because with gradual improvements in performance, there’s a quantum leap where the pain of doing something goes away. And when it’s not painful, you can do it often and that can completely change the way you work, opening up avenues that used to be closed.

This thought can be generalised to a lot of areas – gradual improvements that suddenly lead to a change in what you can do. One such scenario that I’ve been thinking about a little lately is the pattern of using databases (as in something external to the code – XML files, properties files, whatever) for configuration data. The rationale for that is to make change easier and quicker, and the pattern comes out of a situation where the next release is some months away, but a database change can be done in minutes. These days, though, that situation should no longer apply when building web services. If you do things right, it should be possible to do the next release within minutes or at least hours, and this means that you can hardcode your some of your configuration data instead of using a database.

Hardcoding configurable options gives the following benefits:

  • Consistency across environments – with databases, there’s a risk (or a guarantee, more or less) that environments will differ. This will lead to surprises and/or wasted effort when behaviour changes from one environment to another.
  • Better testability – you can more easily prove that your application does what it should do if its behaviour is entirely defined by the code rather than by some external data.
  • Simpler ‘physical form’ of the system – a single deployable unit rather than one code unit and a database unit. Among other things, this leads to easier deployments – no, or at least less frequent, need for database updates.

Of course, this idea doesn’t apply to all kinds of configuration options. It’s useful primarily for those that change the system behaviour – feature toggles, business rules for data normalisation, URL rewrite rules, that sort of thing. Data such as the addresses of downstream services, databases (!), etc, of course needs to be configurable on a per-environment basis rather than hardwired into the build.

This is yet another (though pretty minor) reason to work towards making frequent releases easy and painless: the possibility of a change in architecture and process that will allow you to spend less time doing regression testing and also helps speed up the deployments themselves.

, , ,

Leave a comment

Don’t use @Test(expected = …)

The one feature of JUnit that I really don’t like is the ability to say that a test is expected to throw a particular exception. Here’s an example:

public class MyClass {

   private final Something something;

   public MyClass(Something something) {
      this.something = something;

   public void doWhatever(String aString) {


public class MyClassTest {

  @Test(expected = NullPointerException.class)
  public void shouldThrowNullPointerExceptionWhenStringIsNull() throws Exception {
    MyClass myClass = new MyClass(null);

This example is not as bad as they can be, but it illustrates the point. The method call will indeed throw a NullPointerException, but the reason won’t be that the string parameter is null – the NPE is thrown because the ‘something’ member is null. The general problem with the ‘expected’ parameter to the @Test annotation is that you don’t know which particular case led to the exception, and general exceptions such as NullPointerException (or, worse RuntimeException, or perhaps some generic exception you use for wrapping all kinds of checked exception in some code) can be thrown for many reasons.
I very much prefer code along these lines:

  public void throwsExceptionWithMeaningfulInformation() throws Exception {
     IOException expected = new IOException("expected");

     when(collaborator).method("some parameter").thenThrow(expected);

     try {
        classUnderTest.methodThatCallsCollaborator("some parameter");
        fail("expected exception");
     catch (WrappingException e) {
        assertThat(e.getCause(), equalTo((Throwable) expected));
        assertThat(e.getMessage(), containsString("some parameter"));
        assertThat(e.getMessage(), containsString("something else the code should report"));

EDIT: based on Dominic’s comment, I now normally prefer the idiom below – the rest of the post still applies, but using @Rule and ExpectedException is more concise and readable. The only thing missing is an easily accessible Matcher<Throwable> that would allow you to match the cause as well. I’ll probably create one, for internal use, at least.

EDIT: JUnit 4.11-SNAPSHOT now has support for cause matching using ExpectedException. :)

  public ExpectedException thrown = ExpectedException.none();

  public void throwsNPEIfSecondParameterIsNull() throws Exception {

     new ClassThatDisallowsNullForTheSecondParameter("hi", null);

Both these idioms give you the ability to verify that you’ve in fact triggered the error you wanted to trigger, and it also gives you the ability to ensure that the code is reporting the problem in a good way by checking the message and cause of the thrown exception. Sure, it requires a little more typing than just using the @Test annotation, but you get that time back the first time that exception is actually thrown and somebody needs to figure out what went wrong. Both of the 2 minutes it took to do the extra typing!

, ,