Fluent APIs in Java

In our inventory management system at Shopzilla we’ve got a reporting tool that we’ve built using Wicket. A couple of months ago (this post has languished in draft status for a while), I was going to add a new page and took a look at a previous one to see what I could copy/paste. What I found was this wall of text for creating a table:

List<IColumn<AggregatedFeedRunSummary, String>> columns = Lists.newArrayList();
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Name"), "merchant.merchantName", "merchant.merchantName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Merchant Id"), "merchant.merchantId", "merchant.merchantId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Feed Id"), "feed.feedId", "feed.feedId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Country"), "merchant.countryCode", "merchant.countryCode"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Account Manager"), "merchant.accountManagerName", "merchant.accountManagerName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Tier"), "merchant.keyStatus", "merchant.keyStatus"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Revenue Status"), "merchant.revStatus", "merchant.revStatus"));
 columns.add(new IndexablePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Indexable"), "feed.indexable"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Proteus Last Run"), PROTEUS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Feed Changed"), PROTEUS, "feed_changed"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("FQS Last Run"), FQS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Character encoding"), FQS, "encodingUsed"));
 columns.add(new CharacterEncodingSourceColumn(new Model("CE Source"), FQS));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total Raw Offers"), FQS, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Invalid"), FQS, "invalid"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Duplicate"), FQS, "duplicate"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Valid"), FQS, "valid"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("DI Last Run"), DI, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total"), DI, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Created"), DI, "created"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Updated"), DI, "updated"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Deleted"), DI, "deleted"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Unchanged"), DI, "unchanged"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Resent"), DI, "resent"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Failed"), DI, "Failed"));

 columns.add(new DatePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Last DP Snapshot"), "lastSnapshotDate"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Viable"), "viable", "viable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Non-viable"), "nonViable", "nonViable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Deleted"), "deleted", "deleted"));

As an aside, calling it a wall of text should in no way be interpreted as criticism of the people who wrote it – on the contrary, I think creating the wall of text was a great engineering decision. More on that later.

The code above is a clear example of what a lot of people really don’t like about Java (me included, though I do like Java as a language overall). The signal-to-noise ratio is horrible, mostly due to all the type parameters. There’s stuff being done to the language to help reduce that, using diamond notation and things, but I think in this case, the problem is not really the language. Saying Java sucks on the basis of the code above is a case of blaming the tools for a mistake that we made ourselves. Here are some things that are wrong with the code above:

First, there’s a lot of stuff that’s not really relevant. We don’t care about the exact data types that are used as column entries in the Wicket view, but we do care about what the contents of the column are and how they are displayed to the user. So “new AggregatedFeedRunPropertyColumn(new Model(“, etc, is mostly noise except insofar as it tells us where to look for the information we’re after.
A very similar point is that a lot of the information that is there is not very clear. The fact that the type of the column is a ‘PropertyColumn<AggregatedFeedSummary, String> doesn’t convey much in the way of information unless you’re very familiar with Wicket and the specific data types that we’re using.
There’s also missing information: as an example, the string “merchant.merchantName” is repeated twice, and it’s not clear why. There is in fact a reason: the first time, it indicates which property value to display in a particular table cell, and the second time, it indicates which property value to use when sorting based on this column.

Using a fluent API, we can express the code above like so:

return FrvColumns.columnsFor(AggregatedFeedRunSummary.class, String.class)
 .add("Name").sourceProperty("merchant.merchantName").sortable("merchant.merchantName")
 .add("Merchant Id").sourceProperty("merchant.merchantId").sortable("merchant.merchantId")
 .add("Feed Id").sourceProperty("feed.feedId").sortable("feed.feedId")
      .link(FeedRunHistoryPage.class)
        .param("merchantId").sourceProperty("merchant.merchantId")
        .param("feedId").sourceProperty("feed.feedId")
 .add("Country").sourceProperty("merchant.countryCode").sortable("merchant.countryCode")
 .add("Account Manager").sourceProperty("merchant.accountManagerName").sortable("merchant.accountManagerName")
 .add("Tier").sourceProperty("merchant.keyStatus").sortable("merchant.keyStatus")
 .add("Revenue Status").sourceProperty("merchant.revStatus").sortable("merchant.revStatus")
 .add("Indexable").sourceProperty("feed.indexableAsString")

 .add("Proteus Last Run").sourceProperty("properties.proteus.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(PROTEUS.toString())
 .add("Feed Changed").sourceProperty("properties.proteus.feed_changed")

 .add("FQS Last Run").sourceProperty("properties.fqs.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(FQS.toString())
 .add("Character encoding").sourceProperty("encodingInfo.encoding")
 .add("CE Source").sourceProperty("encodingInfo.source")
 .add("Total Raw Offers").sourceProperty("properties.fqs.total")
 .add("Invalid").sourceProperty("properties.fqs.invalid")
 .add("Duplicate").sourceProperty("properties.fqs.duplicate")
 .add("Valid").sourceProperty("properties.fqs.valid")

 .add("DI Last Run").sourceProperty("properties.di.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(DI.toString())
 .add("Total").sourceProperty("properties.di.total")
 .add("Created").sourceProperty("properties.di.created")
 .add("Updated").sourceProperty("properties.di.updated")
 .add("Deleted").sourceProperty("properties.di.deleted")
 .add("Unchanged").sourceProperty("properties.di.unchanged")
 .add("Resent").sourceProperty("properties.di.resent")
 .add("Failed").sourceProperty("properties.di.failed")

 .add("Last DP Snapshot").sourceProperty("lastSnapshotDate").date()
 .add("Viable").sourceProperty("viable").sortable("viable")
 .add("Non-viable").sourceProperty("nonViable").sortable("nonViable")
 .add("Deleted").sourceProperty("deleted").sortable("deleted")
 .build();

The key improvements, in my opinion, are:

Less text. It’s not exactly super-concise, and there’s a lot of repetition of long words like ‘sourceProperty’, and so on. But there’s a lot less text, meaning a lot less noise. And pretty much everything that’s there carries relevant information.
More clarity in what the various strings represent. I think in both the original and updated versions, most people would guess that the first string is the title of the column. The fluent API version is much clearer about why some of the columns have the same property string repeated – it’s because it indicates that the list should be sortable by that column and which property of the list items should be used for sorting.
Easier modification – rather than having to figure out what class to use to display a correctly formatted Date or epoch-based timestamp, you just tack on the right transformation method (.date() or .epoch()). Since this is available at your fingertips via command-space, or whatever your favourite IDE uses for completion, it’s very easy to find out what your options are.
More transparency. The primary example is the AggregatedFeedRunPropertyColumn, which behind the scenes looks for property values in ‘properties.%s.%s’, and adds ‘component=%s’ as a parameter if the thing is a link. This behaviour is visible if you look in the class, but not if you look at the configuration of the columns. With the fluent API, both those behaviours are explicit.

So, in this case a fluent API both reduces noise and adds signal that was missing before. The cost, of course, is that it takes some effort to implement the fluent API: that’s why I think it was a good engineering decision not to do so immediately. The investment in the API will pay off only if there’s enough volume in terms of pages and column definitions that can use it. What’s more, it’s not just hard to know whether or not you should invest in a fluent API up front, it won’t be clear what the API should be until you have a certain volume of pages that use it.

Other big factors that determine whether an investment in a clearer API is going to pay off are the rate of change of the system in question and whether it’s a core component that everyone working on will know well or a more peripheral one that many developers can be expected to touch with only a vague understanding of how it works. In this case, it’s the latter – reporting is the sort of thing that you need to change often to reflect changes in business logic. Also, since you need to report on all features in the system, it’s normally the case that people will need to be able to modify the report views without being experts in how the tool works. That means that ease of understanding is crucial, and that in turn means it’s crucial that it be obvious for people how to make changes so as to avoid having everyone come up with their own solution for the same type of problem.

The standard pattern I tend to use when building a fluent API is to have a Builder class, or in more complex cases like above, a set of them. So in the example above, this is how it starts:


public static <T, S> ColumnsBuilder<T, S> columnsFor(Class rowClass, Class<S> sortClass) {
   return new ColumnsBuilder<T, S>();
}

public static class ColumnsBuilder<T, S> implements IClusterable {
   private final List<SingleColumnBuilder<T, S>> builders = Lists.newArrayList();

   public SingleColumnBuilder<T, S> addColumn(IColumn<T, S> column) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, column);
      builders.add(columnBuilder);
      return columnBuilder;
   }

   public SingleColumnBuilder<T, S> add(String label) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, label);

      builders.add(columnBuilder);

      return columnBuilder;
   }

   public List<IColumn<T, S>> build() {
      return ImmutableList.copyOf(
        Lists.transform(builders, new Function<SingleColumnBuilder<T, S>, IColumn<T, S>>() {
           @Override
           public IColumn<T, S> apply(SingleColumnBuilder<T, S> input) {
              return input.buildColumn();
           }
      }));
   }
}

In this case, the builder hierarchy matches the resulting data structure: The ColumnsBuilder knows how to add a new column to the table (TableBuilder might have been a better name), the SingleColumnBuilder knows how to build a single column(!), and then further down the hiearchy, there are LinkBuilders, ParamBuilders, etc.

I sometimes let a single Builder class implement multiple interfaces, if I want to enforce a particular ‘grammar’ to the API. So, for instance, if I want code to look like this:

 ActionChain actions = Actions.first().jump().onto(platform).then().climb().onto(couch).build();

I might go with something like this:


public interface NeedsVerb {
  public NeedsTarget jump();
  public NeedsTarget climb();
}

public interface NeedsTarget {
  public CanBeBuilt onto(Target target);
}

public interface CanBeBuilt {
  public NeedsVerb then();
  public ActionChain build();
}

public class Actions {
  public NeedsVerb first() {
    return new Builder();
  }

  static class Builder implements NeedsVerb, NeedsTarget, CanBeBuilt {
    private final List actions = new LinkedList();
    private Verb verb = null;

    @Override
    public NeedsTarget jump() {
       verb = Verb.JUMP;
       return this;
    }

    // .. same for climb()

    @Override
    public CanBeBuilt onto(Target target) {
       actions.add(new Action(verb, target));
       return this;
    }

    @Override
    public NeedsVerb then() {
       verb = null; // whatever, this is just an example ;)
       return this;
    }

    @Override
    public ActionChain build() {
       return new ActionChain(actions);
    }
  }
}

Some points to note:

I usually don’t bother too much with things like immutability and thread-safety in the Builders. They’re “internal”, and generally only used in single-threaded, initialisation-type contexts. I would worry very much about those things in the Action and ActionChain classes.
I usually spend more time on consistency checks and error-reporting than in the example above. The point of building a fluent API is usually to make life easier for somebody who is not an expert (and shouldn’t have to be) in the internal workings of the exposed functionality. So making it very easy to set things up correctly and to understand what is wrong when things have not been set up correctly are key.

In summary, fluent APIs are, in my opinion, often a great choice to improve code readability – and this is of course language independent, though particularly noticeable in a verbose language like Java. The main drawback is the up-front implementation overhead, but since developers spend much more time trying to understand old code than write new, that implementation cost is often insignificant compared to the improvement in reading time. This is particularly true when a) the code calling the API changes very often, b) many developers need to use the API, and c) the code behind the API is outside of developers’ main focus. As always, one of the best sources of great code is Guava, so for some first-class fluent API implementations, take a look at for instance FluentIterable and CacheBuilder. I think fluent APIs are underused in ‘normal business logic’; they don’t need to be solely the domain of snazzy library developers.

fluent api, Java, Refactoring

This entry was posted on January 15, 2013, 21:30 and is filed under Java, Software Development. You can follow any responses to this entry through RSS 2.0. You can skip to the end and leave a response. Pinging is currently not allowed.

Petter Måhlén's Blog