Archive for category Java

Keeping Classes Simple using the SRP

Posted by Petter Måhlén in Java on November 3, 2013

The Single Responsibility Principle – that a class should have one and only one reason to change – is, I think, one of the most important principles to follow in order to make high quality and easily maintainable software. It’s also one of the principles that I see abused the most often (by myself as well as others). This post is cross-post of something I wrote for the internal Spotify blog, and talks about a technique to make it easier to adhere to the SRP and thus create simple classes, in the Rich Hickey sense: ‘easy’ is a subjective term describing the effort it takes some particular person to do something, ‘simple’ is an objective term describing the resulting artefact. Simple software is valuable (and complex software is costly and causes frustration), so it is very useful to have good tools that make it easy to create simple things.

I’ll describe a concrete technique that I think of as composing behaviours; I think it’s close to if not identical to the decorator pattern. I’ll be using some code I recently wrote in a library wrapping DNS lookups at Spotify. I needed to create something that would let us do SRV lookups, and if the results were empty or there was another kind of failure when doing the lookup, it should retain the previous data. This is to reduce the risk of service outages caused only by DNS failures. I also wanted to ensure that we had metrics on the time we spent on DNS lookups, the rate of failures, and how common empty responses were.

The most common reaction when faced with a problem description like that is to create a class, (DnsSrvResolver, let’s say) that implements those features. In a sense, this class does a single thing – DNS SRV lookups that are metered and whose results can be kept in case of problems. But I would argue that that ‘thing’ is too big, and it’s better to create multiple classes doing smaller things. Here’s how.

First, an interface (the code in the post is Java, but of course the technique can be applied to code in any programming language):

public interface DnsSrvResolver {
  List resolve(String fqdn);
}

Then, an implementation class that does the actual lookup of SRV records using a standard DNS library:

class XBillDnsSrvResolver implements DnsSrvResolver {
  private final LookupFactory lookupFactory;

  XBillDnsSrvResolver(LookupFactory lookupFactory) {
    this.lookupFactory = lookupFactory;
  }

  @Override
  public List resolve(final String fqdn) {
    Lookup lookup = lookupFactory.forName(fqdn);
    Record[] queryResult = lookup.run();

    if (lookup.getResult() != Lookup.SUCCESSFUL) {
      throw new DnsException(
        String.format("Lookup of '%s' failed with code: %d - %s ",
           fqdn, lookup.getResult(), lookup.getErrorString()));
    }

    return toHostAndPorts(queryResult);
  }

  private List toHostAndPorts(Record[] queryResult) {
    ImmutableList.Builder builder =
      ImmutableList.builder();

    if (queryResult != null) {
      for (Record record: queryResult) {
        if (record instanceof SRVRecord) {
          SRVRecord srvRecord = (SRVRecord) record;
          builder.add(HostAndPort.fromParts(
                srvRecord.getTarget().toString(),
                srvRecord.getPort())
          );
        }
      }
    }

    return builder.build();
  }
}

So far, nothing out of the ordinary. Here comes the trick: now, add the metrics-tracking and result-storing behaviours in new classes that delegate the actual DNS lookup to some other, unknown class. Metrics can be done like this:

class MeteredDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Timer timer;
  private final Counter failureCounter;
  private final Counter emptyCounter;

  MeteredDnsSrvResolver(DnsSrvResolver delegate,
                        Timer timer,
                        Counter failureCounter,
                        Counter emptyCounter) {
    this.delegate = delegate;
    this.timer = timer;
    this.failureCounter = failureCounter;
    this.emptyCounter = emptyCounter;
  }

  @Override
  public List<HostAndPort> resolve(String fqdn) {
    final TimerContext context = timer.time();
    boolean success = false;

    try {
      List<HostAndPort> result = delegate.resolve(fqdn);
      if (result.isEmpty()) {
        emptyCounter.inc();
      }
      success = true;
      return result;
    }
    finally {
      context.stop();
      if (!success) {
        failureCounter.inc();
      }
    }
  }
}

And retaining old data like this:

class RetainingDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Map<String, List<HostAndPort>> cache;

  RetainingDnsSrvResolver(DnsSrvResolver delegate) {
    this.delegate = delegate;
    cache = new ConcurrentHashMap<String, List<HostAndPort>>();
  }

  @Override
  public List<HostAndPort> resolve(final String fqdn) {
    try {
      List<HostAndPort> nodes = delegate.resolve(fqdn);

      if (nodes.isEmpty()) {
        nodes = firstNonNull(cache.get(fqdn), nodes);
      }
      else {
        cache.put(fqdn, nodes);
      }

      return nodes;
    }
    catch (Exception e) {
      if (cache.containsKey(fqdn)) {
        return cache.get(fqdn);
      }

      throw Throwables.propagate(e);
    }
  }
}

Note how small and simple the classes are. That makes them very easy to test – only the top level one requires complex setup, due to the nature of the dnsjava library. The others are easily tested like so:

  @Before
  public void setUp() throws Exception {
    delegate = mock(DnsSrvResolver.class);

    resolver = new RetainingDnsSrvResolver(delegate);

    nodes1 = nodes("noden1", "noden2");
    nodes2 = nodes("noden3", "noden5", "somethingelse");
  }

  @Test
  public void shouldReturnResultsFromDelegate() throws Exception {
    when(delegate.resolve(FQDN)).thenReturn(nodes1);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }

  @Test
  public void shouldReturnFreshResultsFromDelegate() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenReturn(nodes2);

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes2));
  }

  @Test
  public void shouldRetainDataIfNewResultEmpty() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenReturn(nodes());

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }

  @Test
  public void shouldRetainDataOnFailure() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenThrow(new DnsException("expected"));

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }
  // etc..

The fact that the functionality is implemented using several different classes is an implementation detail that shouldn’t leak through to users of the library. (You may have noticed that the implementation classes are package-private.) So to make it easy for clients, I like to provide a simple fluent API, something like this perhaps:

    DnsSrvResolver resolver =
        DnsSrvResolvers.newBuilder()
          .cachingLookups(true)
          .metered(true)
          .retainingDataOnFailures(true)
          .build();

In summary, the advantages of splitting up behaviours into different classes are:

Each class becomes really simple, which makes it easy to test them exhaustively. Some of the code above does non-trivial things, but testing it is still easy.
Since each behaviour lives in a single place only, it’s trivial to, for instance, change the way metrics are recorded, or to replace the underlying DNS library without affecting the rest of the chain.
Each class becomes more understandable because it is so small. Since behaviours are loosely coupled, the amount of stuff you need to keep ‘in working memory’ to work with a class is minimised.
Varying behaviour in different situations becomes easier as it becomes a matter of composing different classes rather than having conditional logic in a single class. Basically, only the builder implementation in the API example above needs to worry about the value of ‘metered’. The DnsSrvResolver implementation doesn’t need to do a conditional check on ‘if (metered)’.

Doing this is not completely free, though:

With more classes, you get more code to search through if you want to find out how, for instance, data is retained. It may be hard to understand the more dynamic run-time object graph that is behind the DnsSrvResolver you’re using. If the whole picture is there statically in code, it’s easier to find it.
Instead of having complexity within a class, you have (more) complex interactions between instances of different classes. The complexity doesn’t go away entirely, it is moved.

Things that should get you thinking about whether you should split out behaviours into different classes include having too many (> 3-4 or so) collaborators injected into the constructor, or using the word ‘and’ when describing the ‘single’ responsibility of a class. Describing a class as “doing SRV lookups and retaining data and collecting statistics” is a giveaway that you don’t actually think of it as doing one thing.

In my experience, the benefits of using this technique very often outweigh the costs. It is easier to make stable software from a number of small, simple, composable building blocks than to do it using larger, more complex building blocks. Dealing with complexity in how things are put together is easier than trying to compose complex things. Writing code like this doesn’t take longer than writing it in a single class – the only added difficulty is in learning to see when there might be a need to split things out in order to make them simpler.

Java, Simplicity

1 Comment

Fluent APIs in Java

Posted by Petter Måhlén in Java, Software Development on January 15, 2013

In our inventory management system at Shopzilla we’ve got a reporting tool that we’ve built using Wicket. A couple of months ago (this post has languished in draft status for a while), I was going to add a new page and took a look at a previous one to see what I could copy/paste. What I found was this wall of text for creating a table:

List<IColumn<AggregatedFeedRunSummary, String>> columns = Lists.newArrayList();
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Name"), "merchant.merchantName", "merchant.merchantName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Merchant Id"), "merchant.merchantId", "merchant.merchantId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Feed Id"), "feed.feedId", "feed.feedId"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Country"), "merchant.countryCode", "merchant.countryCode"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Account Manager"), "merchant.accountManagerName", "merchant.accountManagerName"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Tier"), "merchant.keyStatus", "merchant.keyStatus"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Revenue Status"), "merchant.revStatus", "merchant.revStatus"));
 columns.add(new IndexablePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Indexable"), "feed.indexable"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Proteus Last Run"), PROTEUS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Feed Changed"), PROTEUS, "feed_changed"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("FQS Last Run"), FQS, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Character encoding"), FQS, "encodingUsed"));
 columns.add(new CharacterEncodingSourceColumn(new Model("CE Source"), FQS));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total Raw Offers"), FQS, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Invalid"), FQS, "invalid"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Duplicate"), FQS, "duplicate"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Valid"), FQS, "valid"));

 columns.add(new AggregatedFeedRunPropertyColumn(new Model("DI Last Run"), DI, "start_time")
    .isDate().isLinkTo(RunDetailsPage.class, FEEDRUN_PARAM));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Total"), DI, "total"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Created"), DI, "created"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Updated"), DI, "updated"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Deleted"), DI, "deleted"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Unchanged"), DI, "unchanged"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Resent"), DI, "resent"));
 columns.add(new AggregatedFeedRunPropertyColumn(new Model("Failed"), DI, "Failed"));

 columns.add(new DatePropertyColumn<AggregatedFeedRunSummary, String>(new Model("Last DP Snapshot"), "lastSnapshotDate"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Viable"), "viable", "viable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Non-viable"), "nonViable", "nonViable"));
 columns.add(new PropertyColumn<AggregatedFeedRunSummary, String>(new Model("Deleted"), "deleted", "deleted"));

As an aside, calling it a wall of text should in no way be interpreted as criticism of the people who wrote it – on the contrary, I think creating the wall of text was a great engineering decision. More on that later.

The code above is a clear example of what a lot of people really don’t like about Java (me included, though I do like Java as a language overall). The signal-to-noise ratio is horrible, mostly due to all the type parameters. There’s stuff being done to the language to help reduce that, using diamond notation and things, but I think in this case, the problem is not really the language. Saying Java sucks on the basis of the code above is a case of blaming the tools for a mistake that we made ourselves. Here are some things that are wrong with the code above:

First, there’s a lot of stuff that’s not really relevant. We don’t care about the exact data types that are used as column entries in the Wicket view, but we do care about what the contents of the column are and how they are displayed to the user. So “new AggregatedFeedRunPropertyColumn(new Model(“, etc, is mostly noise except insofar as it tells us where to look for the information we’re after.
A very similar point is that a lot of the information that is there is not very clear. The fact that the type of the column is a ‘PropertyColumn<AggregatedFeedSummary, String> doesn’t convey much in the way of information unless you’re very familiar with Wicket and the specific data types that we’re using.
There’s also missing information: as an example, the string “merchant.merchantName” is repeated twice, and it’s not clear why. There is in fact a reason: the first time, it indicates which property value to display in a particular table cell, and the second time, it indicates which property value to use when sorting based on this column.

Using a fluent API, we can express the code above like so:

return FrvColumns.columnsFor(AggregatedFeedRunSummary.class, String.class)
 .add("Name").sourceProperty("merchant.merchantName").sortable("merchant.merchantName")
 .add("Merchant Id").sourceProperty("merchant.merchantId").sortable("merchant.merchantId")
 .add("Feed Id").sourceProperty("feed.feedId").sortable("feed.feedId")
      .link(FeedRunHistoryPage.class)
        .param("merchantId").sourceProperty("merchant.merchantId")
        .param("feedId").sourceProperty("feed.feedId")
 .add("Country").sourceProperty("merchant.countryCode").sortable("merchant.countryCode")
 .add("Account Manager").sourceProperty("merchant.accountManagerName").sortable("merchant.accountManagerName")
 .add("Tier").sourceProperty("merchant.keyStatus").sortable("merchant.keyStatus")
 .add("Revenue Status").sourceProperty("merchant.revStatus").sortable("merchant.revStatus")
 .add("Indexable").sourceProperty("feed.indexableAsString")

 .add("Proteus Last Run").sourceProperty("properties.proteus.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(PROTEUS.toString())
 .add("Feed Changed").sourceProperty("properties.proteus.feed_changed")

 .add("FQS Last Run").sourceProperty("properties.fqs.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(FQS.toString())
 .add("Character encoding").sourceProperty("encodingInfo.encoding")
 .add("CE Source").sourceProperty("encodingInfo.source")
 .add("Total Raw Offers").sourceProperty("properties.fqs.total")
 .add("Invalid").sourceProperty("properties.fqs.invalid")
 .add("Duplicate").sourceProperty("properties.fqs.duplicate")
 .add("Valid").sourceProperty("properties.fqs.valid")

 .add("DI Last Run").sourceProperty("properties.di.start_time").epoch()
    .link(RunDetailsPage.class)
       .param(FEEDRUN_PARAM).sourceProperty("feedRunId.id")
       .param("component").value(DI.toString())
 .add("Total").sourceProperty("properties.di.total")
 .add("Created").sourceProperty("properties.di.created")
 .add("Updated").sourceProperty("properties.di.updated")
 .add("Deleted").sourceProperty("properties.di.deleted")
 .add("Unchanged").sourceProperty("properties.di.unchanged")
 .add("Resent").sourceProperty("properties.di.resent")
 .add("Failed").sourceProperty("properties.di.failed")

 .add("Last DP Snapshot").sourceProperty("lastSnapshotDate").date()
 .add("Viable").sourceProperty("viable").sortable("viable")
 .add("Non-viable").sourceProperty("nonViable").sortable("nonViable")
 .add("Deleted").sourceProperty("deleted").sortable("deleted")
 .build();

The key improvements, in my opinion, are:

Less text. It’s not exactly super-concise, and there’s a lot of repetition of long words like ‘sourceProperty’, and so on. But there’s a lot less text, meaning a lot less noise. And pretty much everything that’s there carries relevant information.
More clarity in what the various strings represent. I think in both the original and updated versions, most people would guess that the first string is the title of the column. The fluent API version is much clearer about why some of the columns have the same property string repeated – it’s because it indicates that the list should be sortable by that column and which property of the list items should be used for sorting.
Easier modification – rather than having to figure out what class to use to display a correctly formatted Date or epoch-based timestamp, you just tack on the right transformation method (.date() or .epoch()). Since this is available at your fingertips via command-space, or whatever your favourite IDE uses for completion, it’s very easy to find out what your options are.
More transparency. The primary example is the AggregatedFeedRunPropertyColumn, which behind the scenes looks for property values in ‘properties.%s.%s’, and adds ‘component=%s’ as a parameter if the thing is a link. This behaviour is visible if you look in the class, but not if you look at the configuration of the columns. With the fluent API, both those behaviours are explicit.

So, in this case a fluent API both reduces noise and adds signal that was missing before. The cost, of course, is that it takes some effort to implement the fluent API: that’s why I think it was a good engineering decision not to do so immediately. The investment in the API will pay off only if there’s enough volume in terms of pages and column definitions that can use it. What’s more, it’s not just hard to know whether or not you should invest in a fluent API up front, it won’t be clear what the API should be until you have a certain volume of pages that use it.

Other big factors that determine whether an investment in a clearer API is going to pay off are the rate of change of the system in question and whether it’s a core component that everyone working on will know well or a more peripheral one that many developers can be expected to touch with only a vague understanding of how it works. In this case, it’s the latter – reporting is the sort of thing that you need to change often to reflect changes in business logic. Also, since you need to report on all features in the system, it’s normally the case that people will need to be able to modify the report views without being experts in how the tool works. That means that ease of understanding is crucial, and that in turn means it’s crucial that it be obvious for people how to make changes so as to avoid having everyone come up with their own solution for the same type of problem.

The standard pattern I tend to use when building a fluent API is to have a Builder class, or in more complex cases like above, a set of them. So in the example above, this is how it starts:


public static <T, S> ColumnsBuilder<T, S> columnsFor(Class rowClass, Class<S> sortClass) {
   return new ColumnsBuilder<T, S>();
}

public static class ColumnsBuilder<T, S> implements IClusterable {
   private final List<SingleColumnBuilder<T, S>> builders = Lists.newArrayList();

   public SingleColumnBuilder<T, S> addColumn(IColumn<T, S> column) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, column);
      builders.add(columnBuilder);
      return columnBuilder;
   }

   public SingleColumnBuilder<T, S> add(String label) {
      SingleColumnBuilder<T, S> columnBuilder = new SingleColumnBuilder<T, S>(this, label);

      builders.add(columnBuilder);

      return columnBuilder;
   }

   public List<IColumn<T, S>> build() {
      return ImmutableList.copyOf(
        Lists.transform(builders, new Function<SingleColumnBuilder<T, S>, IColumn<T, S>>() {
           @Override
           public IColumn<T, S> apply(SingleColumnBuilder<T, S> input) {
              return input.buildColumn();
           }
      }));
   }
}

In this case, the builder hierarchy matches the resulting data structure: The ColumnsBuilder knows how to add a new column to the table (TableBuilder might have been a better name), the SingleColumnBuilder knows how to build a single column(!), and then further down the hiearchy, there are LinkBuilders, ParamBuilders, etc.

I sometimes let a single Builder class implement multiple interfaces, if I want to enforce a particular ‘grammar’ to the API. So, for instance, if I want code to look like this:

 ActionChain actions = Actions.first().jump().onto(platform).then().climb().onto(couch).build();

I might go with something like this:


public interface NeedsVerb {
  public NeedsTarget jump();
  public NeedsTarget climb();
}

public interface NeedsTarget {
  public CanBeBuilt onto(Target target);
}

public interface CanBeBuilt {
  public NeedsVerb then();
  public ActionChain build();
}

public class Actions {
  public NeedsVerb first() {
    return new Builder();
  }

  static class Builder implements NeedsVerb, NeedsTarget, CanBeBuilt {
    private final List actions = new LinkedList();
    private Verb verb = null;

    @Override
    public NeedsTarget jump() {
       verb = Verb.JUMP;
       return this;
    }

    // .. same for climb()

    @Override
    public CanBeBuilt onto(Target target) {
       actions.add(new Action(verb, target));
       return this;
    }

    @Override
    public NeedsVerb then() {
       verb = null; // whatever, this is just an example ;)
       return this;
    }

    @Override
    public ActionChain build() {
       return new ActionChain(actions);
    }
  }
}

Some points to note:

I usually don’t bother too much with things like immutability and thread-safety in the Builders. They’re “internal”, and generally only used in single-threaded, initialisation-type contexts. I would worry very much about those things in the Action and ActionChain classes.
I usually spend more time on consistency checks and error-reporting than in the example above. The point of building a fluent API is usually to make life easier for somebody who is not an expert (and shouldn’t have to be) in the internal workings of the exposed functionality. So making it very easy to set things up correctly and to understand what is wrong when things have not been set up correctly are key.

In summary, fluent APIs are, in my opinion, often a great choice to improve code readability – and this is of course language independent, though particularly noticeable in a verbose language like Java. The main drawback is the up-front implementation overhead, but since developers spend much more time trying to understand old code than write new, that implementation cost is often insignificant compared to the improvement in reading time. This is particularly true when a) the code calling the API changes very often, b) many developers need to use the API, and c) the code behind the API is outside of developers’ main focus. As always, one of the best sources of great code is Guava, so for some first-class fluent API implementations, take a look at for instance FluentIterable and CacheBuilder. I think fluent APIs are underused in ‘normal business logic’; they don’t need to be solely the domain of snazzy library developers.

fluent api, Java, Refactoring

Leave a comment

Bygg – Executing the Build

Posted by Petter Måhlén in Java on March 15, 2011

This is the third post about Bygg – I’ve been fortunate enough to find some time to do some actual implementation, and I now have a version of the tool that can do the following:

First, read a configuration of plugins to use during the build:

public class PluginConfiguration {
  public static Plugins plugins() {
    return new Plugins() {
      public List plugins() {
        return Arrays.asList(
          new ArtifactVersion(ProjectArtifacts.BYGG_TEST_PLUGIN, "1.0-SNAPSHOT"),
          new ArtifactVersion(ProjectArtifacts.GUICE, "2.0"));
      }
    };
  }
}

Second, use that set of plugins to compile a project configuration:

public class Configuration {
  public static ByggConfiguration configuration() {
    return new ByggConfiguration() {
      public TargetDAG getTargetDAG() {
        return TargetDAG.DEFAULT
           .add("plugin")                  // defines the target name when executing
           .executor(new ByggTestPlugin()) // indicates what executing the target means
           .requires("test")               // means it won't be run until after "test"
           .build();
      }
    };
  }
}

Third, actually execute that build – although in the current version, none of the target executors have an actual implementation, so all they do is create a file with their name and the current time stamp under the target directory. The default build graph that is implemented contains targets that pretend to assemble the classpaths (placeholder for downloading any necessary dependencies) for the main code and the test code, targets that compile the main and test code, a target that runs the tests, and a target that pretends to create a package. As the sample code above hints, I’ve got three projects: Bygg itself, a dummy plugin, and a test project whose build requires the test plugin to be compiled and executed.

Fourth – with no example – cleaning up the target directory. This is the only feature that is fully implemented, being of course a trivial one. On my machine, running a clean in a tiny test project is 4-5 times faster using Bygg than Maven (taking 0.4 to 0.5 seconds of user time as compared to more than 2 for Maven), so thus far, I’m on target with regard to performance improvements. A little side note on cleaning is that I’ve come to the conclusion that clean isn’t a target. You’re never ever going to want to deploy a ‘clean’, or use it for anything. It’s an optional step that might be run before any actual target. To clarify that distinction, you specify targets using their names as command line arguments, but cleaning using -c or –clean:


bygg.sh -c compile plugin

As is immediately obvious, there’s a lot of rough edges here. The ones I know I will want to fix are:

Using annotations (instead of naming conventions) and generics (for type safety) in the configuration classes – I’m compiling and loading the configuration files using a library called Janino, which has an API that I think is great, but which by default only supports Java 4. There’s a way around it, but it seems a little awkward, so I’m planning on stealing the API design and putting in a front for JavaCompiler instead.
Updating the returned classes (Plugins and ByggConfiguration), as today they only contain a single element. Either they should be removed, or perhaps they will need to become a little more powerful.
Changing the names of stuff – TargetDAG especially is not great.
There’s a lot of noise, much of which is due to Java as a language, but some of which can probably be removed. The Plugins example above is 11 lines long, but only 2 lines contain useful information – and that’s not counting import statements, etc. Of course, since the number of ‘noise lines’ is pretty much constant, with realistically large builds, the signal to noise ratio will improve. Even so, I’d like it to be better.
I’m pretty sure it’s a good idea to move away from saying “this target requires target X” to define the order of execution towards something more like “this target requires the compiled main sources”. But there may well be reasons why you would want to keep something like “requires” or “before” in there – for instance, you might want to generate a properties file with information collected from version control and CI system before packaging your artifact. Rather than changing the predefined ‘package’ target, you might want to just say ‘run this before package’ and leave the file sitting in the right place in the target directory. I’m not quite sure how best to deal with that case yet – there’s a bit more discussion of this a little later.

Anyway, all that should be done in the light of some better understanding of what is actually needed to build something. So before I continue with improving the API, I want to take a few more steps on the path of execution.

As I’ve mentioned in a previous post, a build configuration in Bygg is a DAG (Directed Acyclic Graph). A nice thing about that is that it opens up the possibility of executing independent paths on the DAG concurrently. Tim pointed out to me that that kind of concurrent execution is an established idea called Dataflow Concurrency. In Java terms, Dataflow Concurrency essentially boils down to communicating all shared mutable state via Futures (returned by Callables executing the various tasks). What’s interesting about the Bygg version of Dataflow Concurrency is that the ‘Dataflow Variables’ can and will be parameters of the Callables executing tasks, rather than being hard-coded as is typical in the Dataflow Concurrency examples I’ve seen. So the graph will exist as a data entity as opposed to being hardwired in the code. This means that deadlock detection is as simple as detecting cycles in the graph – and since there is a requirement that the build configuration must be a DAG, builds will be deadlock free. In general, I think the ability to easily visualise the exact shape of the DAG of a build is a very desirable thing in terms of making builds easily understandable, so that should probably be a priority when continuing to improve the build configuration API.

Another idea I had from the reference to dataflow programming is that the canonical example of dataflow programming is a spreadsheet, where an update in one cell trickles through into updates of other cells that contain formulas that refer to the first one. That example made me change my mind about how different targets should communicate their results to each other. Initially, I had been thinking that most of the data that needs to be passed on from one target to the next should be implicitly located in a well-defined location on disk. So the test compiler would leave the test classes in a known place where the test runner knows to look for them. But that means loading source files into memory to compile them, then writing the class files to disk, then loading them into memory again. That’s a lot of I/O, and I have the feeling that I/O is often one of the things that slows builds down the most. What if there would be a dataflow variable with the classes instead? I haven’t yet looked in detail at the JavaFileManager interface, but it seems to me that it would make it possible to add an in-memory layer in front of the file system (in fact, I think that kind of optimisation is a large part of the reason why it exists). So it could be a nice optimisation to make the compiler store files in memory for test runners, packagers, etc., to pick up without having to do I/O. There would probably have to be a target (or something else, maybe) that writes the class files to disk in parallel with the rest of the execution, since the class files are nice to have as an optimisation for the next build – only recompiling what is necessary. But that write doesn’t necessarily have to slow down the test execution. All that is of course optimisation, so the first version will just use a plain file system-based JavaFileManager implementation. Still, I think it is a good idea to only have a very limited number of targets that directly access the file system, in order to open up for that kind of optimisation. The remainder of the targets should not be aware of the exact structure of the target directory, and what data is stored there.

I’m hoping to soon be able to find some more time to try these ideas out in code. It’ll be interesting to see how hard it is to figure out a good way to combine abstracting away the ‘target’ directory with full freedom for plugins to add stuff to the build package and dataflow concurrency variables.

Builds, Bygg, Java, Maven

Leave a comment

Bygg – Better Dependency Management

Posted by Petter Måhlén in Java on February 17, 2011

I’ve said before that the one thing that Maven does amazingly well is dependency management. This post is about how to do it better.

Declaring and Defining Dependencies

In Maven, you can declare dependencies using a <dependencyManagement /> section in either the current POM or some parent POM, and then use them – include them into the current build – using a <dependencies /> section. This is very useful because it allows you to define in a single place which version of some dependency should be used for a set of related modules. It looks something like:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.shopzilla.site2.service</groupId>
      <artifactId>common</artifactId>
      <version>${service.common.version}</version>
    </dependency>
    <dependency>
      <groupId>com.shopzilla.site2.service</groupId>
      <artifactId>common-web</artifactId>
      <version>${service.common.version}</version>
    </dependency>
   <dependency>
       <groupId>com.shopzilla.site2.core</groupId>
       <artifactId>core</artifactId>
       <version>${core.version}</version>
     </dependency>
   </dependencies>
 </dependencyManagement>

  <dependencies>
   <dependency>
     <groupId>com.shopzilla.site2.service</groupId>
     <artifactId>common</artifactId>
   </dependency>
   <dependency>
     <groupId>com.shopzilla.site2.core</groupId>
     <artifactId>core</artifactId>
   </dependency>
 </dependencies>

To me, there are a couple of problems here:

Declaring a dependency looks pretty much identical to using one – the only difference is that the declaration is enclosed inside a <dependencyManagement /> section. This makes it hard to know what it is you’re looking at if you have a large number of dependencies – is this declaring a dependency or actually using it?
It’s perfectly legal to add a <version/> tag in the plain <dependencies /> section, which will happen unless all the developers touching the POM a) understand the distinction between the two <dependencies /> sections, and b) are disciplined enough to maintain it.

For large POMs and POM hierarchies in particular, the way to define shared dependencies becomes not only overly verbose but also hard to keep track of. I think it could be made much easier and nicer in Bygg, something along these lines:


// Core API defined in Bygg proper
public interface Artifact {
   String getGroupId();
   String getArtifactId();
}

// ----------------------------------------------

// This enum is in some shared code somewhere - note that the pattern of declaring artifacts
// using an enum could be used within a module as well if desired. You could use different enums
// to define, for instance, sets of artifacts to use when talking to databases, or when developing
// a web user interface, or whatever.
public enum MyArtifacts implements Artifact {
    SERVICE_COMMON("com.shopzilla.site2.service", "common"),
    CORE("com.shopzilla.site2.core", "core");
    JUNIT("junit", "junit");
}

// ----------------------------------------------

// This can be declared in some shared code as well, probably but not
// necessarily in the same place as the enum. Note that the type of
// the collection is Collection, indicating that it isn't ordered.
public static final Collection ARTIFACT_VERSIONS = ImmutableList.of(
        new ArtifactVersion(SERVICE_COMMON, properties.get("service.common.version")),
        new ArtifactVersion(CORE, properties.get("core.version")),
        new ArtifactVersion(JUNIT, "4.8.1"));

// ----------------------------------------------

// In the module to be built, define how the declared dependencies are used.
// Here, ordering might be significant (it indicates the order of artifacts on the
// classpath - the reason for the 'might' is I'm not sure if this ordering can carry
// over into a packaged artifact like a WAR).
List moduleDependencies = ImmutableList.of(
    new Dependency(SERVICE_COMMON, EnumSet.of(MAIN, PACKAGE)),
    new Dependency(CORE, EnumSet.of(MAIN, PACKAGE)),
    new Dependency(JUNIT, EnumSet.of(TEST)));

The combination of a Collection of ArtifactVersions and a List of Dependency:s is then used by the classpath assembler target to produce an actual classpath for use in compiling, running, etc. Although the example code shows the Dependency:s as a plain List, I kind of think that there may be value in not having the actual dependencies be something else. Wrapping the list in an intelligent object that gives you filtering options, etc., could possibly be useful, but it’s a bit premature to decide about that until there’s code that makes it concrete.

The main ideas in the example above are:

Using enums for the artifact identifiers (groupId + artifactId) gives a more succinct and harder-to-misspell way to refer to artifacts in the rest of the build configuration. Since you’re editing this code in the IDE, finding out exactly what the artifact identifier means (groupId + artifactId) is as easy as clicking on it while pressing the right key.
If the build configuration is done using regular Java code, shared configuration items can trivially be made available as Maven artifacts. That makes it easy to for instance have different predefined groups of related artifacts, and opens up for composed rather than inherited shared configurations. Very nice!
In the last bit, where the artifacts are actually used in the build, there is an outline of something I think might be useful. I’ve used Maven heavily for years, and something I’ve never quite learned how they work is the scopes. It’s basically a list of strings (compile, test, provided, runtime) that describe where a certain artifact should be used. So ‘compile’ means that the artifact in question will be used when compiling the main source code, when running tests, when executing the artifact being built and (if it is a WAR), when creating the final package. I think it would be far simpler to have a set of flags indicating which classpaths the artifact should be included in. So MAIN means ‘when compiling and running the main source’, TEST is ditto for the test source, and PACKAGE is ‘include in the final package’, and so on. No need to memorise what some scope means, you can just look at the set of flags.

Another idea that I think would be useful is adding an optional Repository setting for an Artifact. With Maven, you can add repositories in addition to the default one (Maven Central at Ibiblio). You can have as many as you like, which is great for some artifacts that aren’t included in Maven Central. However, adding repositories means slowing down your build by a wide margin, as Maven will check each repository defined in the build for updates to each snapshot version of an artifact. Whenever I add a repository, I do that to get access to a specific artifact. Utilising that fact by having two kinds of repositories – global and artifact-specific, maybe – should be simple and represent a performance improvement.

Better Transitive Conflict Resolution

Maven allows you to not have to worry about transitive dependencies required by libraries you include. This is, as I’ve argued before, an incredibly powerful feature. But the thing is, sometimes you do need to worry about those transitive dependencies: when they introduce binary incompatibilities. A typical example is something I ran into the other day, where a top-level project used version 3.0.3 of some Spring jars (such as ‘spring-web’), while some shared libraries eventually included another version of Spring (the ‘spring’ artifact, version 2.5.6). Both of these jars contain a definition of org.springframework.web.context.ConfigurableWebApplicationContext, and they are incompatible. This leads to runtime problems (in other words, the problem is visible too late; it should be detected at build time), and the only way to figure that out is to recognise the symptoms of the problem as a “likely binary version mismatch”, then use mvn dependency:analyze to figure out possible candidates and add exclude rules like this to your POM:

<dependencyManagement>
  <dependencies>
    <dependency>
       <groupId>com.shopzilla.site2.service</groupId>
       <artifactId>common</artifactId>
       <version>${service.common.version}</version>
       <exclusions>
          <exclusion>
            <groupId>org.springframework</groupId>
            <artifactId>spring</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.google.collections</groupId>
            <artifactId>google-collections</artifactId>
          </exclusion>
        </exclusions>
      </dependency>
      <dependency>
        <groupId>com.shopzilla.site2.core</groupId>
        <artifactId>core</artifactId>
        <version>${core.version}</version>
        <exclusions>
          <exclusion>
            <groupId>org.springframework</groupId>
            <artifactId>spring</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.google.collections</groupId>
            <artifactId>google-collections</artifactId>
          </exclusion>
        </exclusions>
      </dependency>
  </dependencies>
</dependencyManagement>

As you can tell from the example (just a small part of the POM) that I pasted in, I had a similar problem with google-collections. The top level project uses Guava, so binary incompatible versions included by the dependencies needed to be excluded. The problems here are:

It’s painful to figure out what libraries cause the conflicts – sometimes, you know or can easily guess (like the fact that different versions of Spring packages can clash), but other times you need to know something a little less obvious (like the fact that Guava has superseded google-collections, something not immediately clear from the names). The tool could just tell you that you have binary incompatibilities on your classpath (I actually submitted a patch to the Maven dependency plugin to fix that, but it’s been stalled for 6 months).
Once you’ve figured out what causes the problem, it’s a pain to get rid of all the places it comes from. The main tool at hand is the dependency plugin, and the way to figure out where dependencies come from is mvn dependency:tree. This lets you know a single source of a particular dependency. So for me, I wanted to find out where the spring jar came from – that meant running mvn dependency:tree, adding an exclude, running it again to find where else the spring jar was included, adding another exclude, and so on. This could be so much easier. And since it could be easier, it should be.
What’s more, the problems are sometimes environment-dependent, so you’re not guaranteed that they will show up on your development machine. I’m not sure about the exact reasons, but I believe that there are differences in the order in which different class loaders load classes in a WAR. This might mean that the only place you can test if a particular problem is solved or not is your CI server, or some other environment, which again adds pain to the process.
The configuration is rather verbose and you need to introduce duplicates, which makes your build files harder to understand at a glance.

Apart from considering binary incompatibilities to be errors (and reporting on exactly where they are found), here’s how I think exclusions should work in Bygg:


 dependencies.exclude().group("com.google.collections").artifact("google-collections")
          .exclude().group("org.springframework").artifact("spring.*").version(except("3.0.3"));

Key points above are:

Making excludes a global thing, not a per-dependency thing. As soon as I’ve identified that I don’t want spring.jar version 2.5.6 in my project, I know I don’t want it from anywhere at all. I don’t care where it comes from, I just don’t want it there! I suppose there is a case for saying “I trust the core library to include google-collections for me, but not the common-web one”, so maybe global excludes aren’t enough. But they would certainly have helped me tremendously a lot of the times I’ve had to use Maven exclusions, and I can’t think of a case where I’ve actually wanted specifically to have an artifact-specific exclusion.
Defining exlusion filters using a fluent API that includes regular expressions. With Spring in particular, you want to make sure that all your jars have the same version. It would be great to be able to say that you don’t want anything other than that.

Build Java with Java?!

I’ve gone through different phases when thinking about using Java to configure builds rather than XML. First, I thought “it’s great, because it allows you to more easily use the standard debugger for the builds and thereby resolve Maven’s huge documentation problem”. But then I realised that the debugging is enabled by ensuring that the IDE has access to the source code of the build tool and plugins that execute the build, and that how you configure it is irrelevant. So then I thought that using Java to configure is pretty good anyway, because it means developers won’t need to learn a new language (as with Buildr or Raven), and that IDE integration is a lot easier. The IDE you use for your regular Java programming wouldn’t need to be taught anything specific to deal with some more Java code. I’ve now come to the conclusion that DSL-style configuration APIs, and even more, using the standard engineering principles for sharing and reusing code for build configurations is another powerful argument in favour of using Java in the build configuration. So I’ve gone from “Java configuration is key”, to “Java configuration is OK, but not important” to “Java configuration is powerful”.

Builds, Bygg, Java, Maven

Leave a comment

Bygg – Ideas for a Better Maven

Posted by Petter Måhlén in Java on January 24, 2011

In a previous post, I outlined some objectives for a better Maven. In this post, I want to talk about some ideas for how to achieve those objectives.

Build Java with Java

The first idea is that the default modus operandi should be that builds are runnable as a regular Java application from inside the IDE, just like the application that is being built. The IDE integration of the build tool should ensure that source code for the build tool and its core plugins is attached so that the developer can navigate through the build tool source. A not quite necessary part of this is that build configuration should be done using Java rather than XML, Ruby, or something else. This gives the following benefits:

Troubleshooting the build is as easy as just starting it in the debugger and setting break points. To understand how to use a plugin, you can navigate to the source implementing it from your build configuration. I think this is potentially incredibly valuable. If it is possible to integrate with IDEs to such an extent that the full source code of the libraries and plugins that are used in the build is available, that means that stepping through and debugging builds is as easy as for any library used in the product being built. And harnessing the highly trained skills that most Java developers have developed for understanding how a third-party library works is something that should make the build system much more accessible compared to having to rely on documentation.
It can safely be assumed that Java developers are or at least want to be very proficient in Java. Using a real programming language for build configurations is very advantageous compared to using XML, especially for those occasions when you have to customise your build a little extra. (This should be the exception rather than the rule as it increases the risk of feature creep and reduces standardisation.)
The IDEs that Java developers use can be expected to be great tools for writing Java code. Code completion, Javadoc popups, instant navigation to the source that implements a feature, etc. This should reduce the complexity of IDE integration.
It opens up for very readable DSL-style APIs, which should reduce the build script complexity and increase succinctness. Also, using Java means you could instantly see which configuration options are available for the thing you’re tweaking at the moment (through method completion, checking values of enums, etc., etc.).

There are some drawbacks, such as having to figure out a way to bootstrap the build (the build configuration needs to be compiled before a build can be run), and the fact that you have to provide a command-line tool anyway for continuous integration, etc. But I don’t think those problems are hard enough to be significant.

I first thought of the idea of using Java for the build configuration files, and that that would be great. But as I’ve been thinking about it, I’ve concluded that the exact format of the configuration files is less important than making sure that developers can navigate through the build source code in exactly the same way as through any third party library. That is something I’ve wanted to do many times when having problems with Maven builds, but it’s practically impossible today. There’s no reason why it should be harder to troubleshoot your build than your application.

Interlude: Aspects of Build Configurations

Identifying and separating things that need to be distinct and treated differently is one of the hard things to do when designing any program, and one of the things that has truly great effect if you get it right. So far, I’ve come up with a few different aspects of a build, namely:

The project or artifact properties – this includes things such as the artifact id, group id, version etc., and can also include useful information such as a project description, SCM links, etc.
The project’s dependencies – the third-party libraries that are needed in the build; either at compile time, test time or package time.
The build properties – any variables that you have in your build. A lot of times, you want to be able to have environment-specific variables that you use, or to refer to build-specific variables such as the output directory for compiled classes. The distinction between variables that may be overridden on a per-installation basis from variables that are determined during the build may mean that there are more than one kind of properties.
The steps that are taken to complete the build – compilations, copying, zipping, running tests, and so on.
The things that execute the various steps – the compiler, the test executor, a plugin that helps generate code from some type of resources, etc.

The point of separating these aspects of the build configuration is that it is likely that you’ll want to treat them differently. In Maven, almost everything is done in a single POM file, which grows large and hard to get an overview of, and/or in a settings.xml file that adds obscurity and reduces build portability. An example of a problem with bunching all the build configuration data into a single file is IntelliJ’s excellent Maven plugin, which wants to reimport the pom.xml with every change you make. Reimporting takes a lot of time on my chronically overloaded Macbook (well, 5-15 seconds, I guess – far too much time). It’s necessary because if I’ve changed the dependencies, IntelliJ needs to update its classpath. The thing is, I think more than 50% or even 70% of the changes I make to pom files don’t affect the dependencies. If the dependencies section were separate from the rest of the build configuration, reimporting could be done only when actually needed.

I don’t think this analysis has landed yet, it feels like some pieces or nuances are still missing. But it helps as background for the rest of the ideas outlined in this post. The steps taken during the build, and the order in which they should be taken, are separate from the things that do something during each step.

Non-linear Build Execution

The second idea is an old one: abandoning Maven’s linear build lifecycle and instead getting back to the way that make does things (which is also Ant’s way). So rather than having loads of predefined steps in the build, it’s much better to be able to specify a DAG of dependencies between steps that defines the order of execution. This is better for at least two reasons: first, it’s much easier to understand the order of execution if you say “do A before B” or “do B after A” in your build configuration than if you say “do A during the process-classes phase and B during the generate-test-sources phase”. And second, it opens up the door to do independent tasks in parallel, which in turn creates opportunities for performance improvements. So for instance, it could be possible to download dependencies for the test code in parallel with the compilation of the production code, and it should be possible to zip up the source code JAR file at the same time as JavaDocs are generated.

What this means in concrete terms is that you would write something like this in your build configuration file:


  buildSteps.add(step("copyPropertiesTemplate")
         .executor(new CopyFile("src/main/template/properties.template",
                                "${OUTPUT_DIR}/properties.txt"))
         .before("package"));

Selecting what to build would be done via the build step names – so if all you wanted to do was to copy the properties template file, you would pass “copyPropertiesTemplate” to the build. The tool would look through the build configuration and in this case probably realise that nothing needs to be run before that step, so the “copyPropertiesTemplate” step would be all that was executed. If, on the other hand, the user stated that the “package” step should be executed, the build tool would discover that lots of things have to be done before – not only “copyPropertiesTemplate” but also “performCoverageChecks”, which in turn requires “executeTests”, and so on.

As the example shows, I would like to add a feature to the make/Ant version: specifying that a certain build step should happen before another one. The reason is that I think that the build tool should come with a default set of build steps that allow the most common build tasks to be run with zero configuration (see below for more on that). So you should always be able to say “compile”, or “test”, and that should just work as long as you stick the the conventions for where you store your source code, your tests, etc. This makes it awkward for a user to define a build step like the one above in isolation, and then after that have to modify the pre-existing “package” step to depend on the “copyPropertiesTemplate” one.

In design terms, there would have to be some sort of BuildStep entity that has a unique name (for the user interface), a set of predecessors and successors, and something that should be executed. There will also have to be a scheduler that can figure out a good order to execute steps in. I’ve made some skeleton implementations of this, and it feels like a good solution that is reasonably easy to get right. One thing I’ve not made up my mind about is the name of this entity – the two main candidates are BuildStep and Target. Build step explains well what it is from the perspective of the build tool, while Target reminds you of Ant and Make and focuses the attention on the fact that it is something a user can request the tool to do. I’ll use build step for the remainder of this post, but I’m kind of thinking that target might be a better name.

Gradual Buildup of DI Scope

Build steps will need to communicate results to one another. Some of this will of necessity be done via the file system – for instance, the compilation step will leave class files in a well-defined place for the test execution and packaging steps to pick up. Other results should be communicated in-memory, such as the current set of build properties and values, or the exact classpath to be used during compilation. The latter should be the output of some assembleClassPath step that checks or downloads dependencies and provides an exact list of JAR files and directories to be used by the compiler. You don’t want to store that sort of thing on the file system.

In-memory parameters should be injected into the executors of subsequent build steps that need them. This means that the build step executors will be gradually adding stuff to the set of objects that can be injected into later executors. A concrete implementation of this that I have been experimenting with is using hierarchical Guice injectors to track this. That means that each step of the build returns a (possibly empty) Guice module, which is then used to create an injector that inherits all the previous bindings from preceding steps. I think that works reasonably well in a linear context, but that merging injectors in a more complex build scenario is harder. A possibly better solution is to use the full set of modules used and created by previous steps to create a completely new injector at the start of each build step. Merging is then simply taking the union of the sets of modules used by the two merging paths through the DAG.

This idea bears some resemblance to the concept of dynamic dependency injection, but it is different in that there are parallel contexts (one for each concurrently executing path) that are mutating as each build step is executed.

I much prefer using DI to inject dependencies into plugins or build steps over exposing the entire project configuration + state to plugins, for all the usual reasons. It’s a bit hard to get right from a framework perspective, but I think it should help simplify the plugins and keep them isolated from one another.

Core Build Components Provided

One thing that Maven does really well is to support simple builds. The minimal pom is really very tiny. I think this is great both in terms of usability and as an indication of powerful build standardisation/strong conventions. In the design outlined in this post, the things that would have to come pre-configured with useful defaults are a set of build steps with correct dependencies and associated executors. So there would be a step that assembles the classpath for the main compilation, another one that assembles the classpath for the test compilation, yet another one that assembles the classpath for the test execution and probably even one that assembles the classpath to be used in the final package. These steps would probably all make use of a shared entity that knows how to assemble classpaths, and that is configured to know about a set of repositories from which it can download dependencies.

By default, the classpath assembler would know about one or a few core repositories (ibiblio.org for sure). Most commercial users will hopefully have their own internal Maven repositories, so it needs to be possible to tell the classpath assembler about these. Similarly, the compiler should have some useful defaults for source version, file encodings, etc., but they should be possible to override in a single place and then apply to all steps that use them.

Of course, the executors (classpath assembler, compiler, etc.) would be shared by build steps by default, but they shouldn’t necessarily be singletons – if one wanted to compile the test code using a different set of compiler flags, one could configure the build to have two compiler instances with different parameters.

The core set of build steps and executors should at a minimum allow you to build, test and deploy (in the Maven sense) a JAR library. Probably, there should be more stuff that is considered to be core than just that.

Naming the Tool

The final idea is a name – Bygg. Bygg is a Swedish word that means “build” as in “build X!”, not as in “a build” or “to build” (a verb in imperative form in other words). It’s probably one letter too long, but at least it’s not too hard to type the second ‘g’ when you’ve already typed the first. It’s got the right number of syllables and it means the right thing. It’s even relatively easy to pronounce if you know English (make it sound like “big” and you’ll be fine), although of course you have to have Scandinavian background to really pronounce the “y” right.

That’s more than enough word count for this post. I have some more ideas about dependency management, APIs and flows in the tool, but I’ll have to save them for another time. Feel free to comment on these ideas, particularly about areas where I am missing the mark!

Builds, Bygg, Maven

5 Comments

Composition vs Inheritance

Posted by Petter Måhlén in Java on August 20, 2010

One of the questions I often ask when interviewing candidate programmers is “what is the difference between inheritance and composition”. Some people don’t know, and they’re obviously out right then and there. Most people explain that “inheritance represents an is-a relationship and composition a has-a”, which is correct but insufficient. Very few people have a good answer to the question that I’m really asking: what impact does the choice of one or the other have on your software design? The last few years it seems there has been a trend towards feeling that inheritance is over-used and composition to a corresponding degree under-used. I agree, and will try to show some concrete reasons why that is the case. (As a side note, the main trigger for this blog post is that about a year ago, I introduced a class hierarchy as a part of a refactoring story, and it was immediately obvious that it was an awkward solution. However, it (and the other changes) represented a big improvement over what went before, and there wasn’t time to fix it back then. It’s bugged me ever since, and thanks to our gradual refactoring, we’re getting closer to the time when we can put something better in place, so I started thinking about it again.)

One guy I interviewed the other day had a pretty good attempt at describing when one should choose one or the other: it depends on which is closest to the real world situation that you’re trying to model. That has the ring of truth to it, but I’m not sure it is enough. As an example, take the classically hierarchical model of the animal kingdom: Humans and Chimpanzees are both Primates, and, skipping a bunch of steps, they are Animals just like Kangaroos and Tigers. This is clearly a set of is-a relationships, so we could model it like so:


public class Animal {}

public class Primate extends Animal {}

public class Human extends Primate {}

public class Chimpanzee extends Primate {}

public class Kangaroo extends Animal {}

public class Tiger extends Animal {}

Now, let’s add some information about their various body types and how they walk. Both Humans and Chimpanzees have two arms and two legs, and Tigers have four legs. So we could add two legs and two arms at the Primate level, and four legs at the Tiger level. But then it turns out that Kangaroos also have two legs and two arms, so this solution doesn’t work without duplication. Perhaps a more general concept involving limbs is needed at the animal level? There would still be duplication of the ‘legs + arms’ classification in both Kangaroo and Primate, not to mention what would happen if we introduce a RattleSnake into our Eden (or an EarthWorm if you prefer an animal that has never had limbs at any stage of its evolutionary history).

A similar problem is shown by introducing a walk() method. Humans do a bipedal walk, Chimpanzees and Tigers walk on all fours, and Kangaroos hop using their rear feet and tail. Where do we locate the logic for movement? All the animals in our sample hierarchy can walk on all fours (although humans tend to do so only when young or inebriated), so having a quadrupedWalk() method at the Animal level that is called by the walk() methods of Tiger and Chimpanzee makes sense. At least until we add a Chicken or Halibut, because they will now have access to a type of locomotion they shouldn’t.

The root problem here is of course that the model is bad even though it seems like a natural one. If you think about it, body types and locomotion are not defined by the various creatures’ positions in the taxonomy – there are both mammals and fish that swim, and both fish, lizards and snakes can be legless. I think it is natural for us to think in terms of templates and inherited properties from a more general class (Douglas Hofstadter said something very similar, I think in Gödel, Escher, Bach). If that kind of thinking is a human tendency, that could explain why it seems so common to initially think that a hierarchical model is the best one. It is often a mistake to assume that the presence of an is-a relationship in the real world means that inheritance is right for your object model.

But even if, unlike the example above, a hierarchical model is a suitable one to describe a certain real-world scenario, there are issues. A big one, in my opinion, is that the classes that make up a hierarchical model are very tightly coupled. You cannot instantiate a class without also instantiating all of its superclasses. This makes them harder to test – you can never ‘mock’ a superclass, so each test of a leaf in the hierarchy requires you to do all the setup for all the classes above. In the example above, you might have to set up the limbs of your RattleSnake and the data needed for the quadrupedWalk() method of the Halibut before you could test them.

With inheritance, the smallest testable unit is larger than it is when composition is used and you increase code duplication in your tests at least. In this way, inheritance is similar to use of static methods – it removes the option to inject different behaviour when testing. The more code you try to reuse from the superclass/es, the more redundant test code you will typically get, and all that redundancy will slow you down when you modify the code in the future. Or, worse, it will make you not test your code at all because it’s such a pain.

There is an alternative model, where each of the beings above have-a BodyType and possibly also have-a Locomotion that knows how to move that body. Probably, each of the animals could be differently configured instances of the same Animal class.


public interface BodyType {}

public TwoArmsTwoLegs implements BodyType {}

public FourLegs implements BodyType {}

public interface Locomotion<B extends BodyType> {
  void walk(B body);
}

public class BipedWalk implements Locomotion<TwoArmsTwoLegs> {
  public void walk(TwoArmsTwoLegs body) {}
}

public class Slither implements Locomotion<NoLimbs> {
  public void walk(NoLimbs body) {}
}

public class Animal {
   BodyType body;
   Locomotion locomotion;
}

Animal human = new Animal(new TwoArmsTwoLegs(), new BipedWalk());

This way, you can very easily test the bodies, the locomotions, and the animals in isolation of each other. You may or may not want to do some automated functional testing of the way that you wired up your Human to ensure that it behaves like you want it to. I think composition is almost universally preferable to inheritance for code reuse – I’ve tried to think of cases where the only natural model is a hierarchical one and you can’t use composition instead, but it seems to be beyond me.

In addition to the arguments here, there’s the well-known fact that inheritance breaks encapsulation by exposing subclasses to implementation details in the superclass. This makes the use of subclassing as a means to integrate with a library a fragile solution, or at least one that limits the future evolution of the library.

So, there it is – that summarises what I think are the differences between composition and inheritance, and why one should tend to prefer composition over inheritance. So, if you’re reading this because you googled me before an interview, this is one answer you should get right!

Code Concepts, Java, Refactoring, Unit Testing

12 Comments

Immutability: a constraint that empowers

Posted by Petter Måhlén in Java, Software Development on May 4, 2010

In Effective Java (Second Edition, item 15), Josh Bloch mentions that he is of the opinion that Java programmers should have the default mindset that their objects should be immutable, and that mutability should be used only when strictly necessary. The reason for this post is it seems to me that most Java code I read doesn’t follow this advice, and I have come to the conclusion that immutability is an amazing concept. My theory is that one of the reasons why immutability isn’t as widely used as it should be is that as programmers or architects, we usually want to open up possibilities: a flexible architecture allows for future growth and changes and gives as much freedom as possible to solve coding problems in the most suitable way. Immutability seems to go against that desire for flexibility in that it takes an option away: if your architecture involves immutable objects, you’ve reduced your freedom because those objects can’t change any longer. How can that be good?

The two main motivations for using immutability according to Effective Java is that a) it is easier to share immutable objects between threads and b) immutable objects are fundamentally simple because they cannot change states once instantiated. The former seems to be the most commonly repeated argument in favour of using immutable objects – I think probably because it is far more concrete: make objects immutable and they can easily be shared by threads. The latter, that the immutable objects are fundamentally simple, is the more powerful (or at least more ubiquitous) reason in my opinion, but the argument is harder to make. After all, there’s nothing you can do with an immutable object that you cannot also do with a mutable one, so why constrain your freedom? I hope to illustrate a couple of ways that you both open up macro-level architectural options and improve your code on the micro level by removing your ability to modify things.

Macro Level: Distributed Systems

First, I would like to spend some time on the concurrency argument and look at it from a more general perspective than just Java. Let’s take something that is familiar to anybody who has worked on a distributed system: caching. A cache is a (more) local copy of something where the master copy is for some reason expensive to fetch. Caching is extremely powerful and it is used literally everywhere in computing. The fact that the item in the cache could be different to the master copy, due to either one of them having changed, leads to a lot of problems that need solving. So there are algorithms like read-through, refresh-ahead, write-through, write-behind, and so on, that deal with the problem that the data in your cache or caches may be outdated or in a ‘future state’ when compared to the master copy. These problems are well-studied and solved, but even so, having to deal with stale data in caches adds complexity to any system that has to deal with it. With immutable objects, these problems just vanish. If you have a copy of a certain object, it is guaranteed to be the current one, because the object never changes. That is extremely powerful in designing a distributed architecture.

I wonder if I can write a blog post that doesn’t mention Git – apparently not this one at least :). Git is a marvellous example of a distributed architecture whose power comes to a large degree from reducing mutability. Most of the entities in Git are immutable: commits, trees and blobs and all of them are referenced via keys that are SHA-1 hashes of their contents. If a blob (a file) changes, a new entity in Git is created with a new hash code. If a tree (a directory – essentially a list of hash codes and names that identify trees or blobs that live underneath it) changes through a file being added, removed or modified, a new tree entity with new contents and a new hash code is created. And whenever some changes are committed to a Git repository, a commit entity (with a reference to the new top tree entity and some metadata about who and when made the commit, a pointer to the previous commit/s, etc) that describes the change set is created. Since the reference to each object is uniquely defined by its content, two identical objects that are created separately will always have the same key. (If you want to learn more about Git internals, here is a great though not free explanation.) So what makes that brilliant? The answer is that the immutability of objects means your copy is guaranteed to be correct and SHA-1 object references means that identical objects are guaranteed to have identical references and speeds up identity checks – since collisions are extremely unlikely, identical references is the same as objects being identical. Suppose somebody makes a commit that includes a blob (file) with hash code 14 and you fetch the commit to your local Git repository, and suppose further that a blob with hash code 14 is already in your repository. When Git fetches the commit, it’ll come to the tree that references blob 14 and simply note that “I’ve already got blob 14, so no need to fetch it”.

The diagram above shows a simplified picture of a Git repository where a commit is added. The root of the repository is a folder with a file called ‘a.txt’ and a directory called ‘dir’. Under the directory there’s a couple more files. In the commit, the a.txt file is changed. This means that three new entities are created: the new version of a.txt (hash code 0xd0), the new version of the root directory (hash code 0x2a), and the commit itself. Note that the new root directory still points to the old version of the ‘dir’ directory and that none of the objects that describe the previous version of the root directory are affect in any way.

That Git uses immutable objects in combination with references derivable from the content of objects is what opens up possibilities for being very efficient in terms of networking (only transferring necessary objects), local space storage (storing identical objects only once), comparisons and merges (when you reach a point where two trees point to the same object, you stop), etc., etc. Note that this usage of immutable objects is quite different from the one you read about in books like Effective Java and Java Concurrency in Practice – in those books, you can easily get the impression that immutability is something that requires taking some pretty special measures with regard to how you specify your classes, inheritance and constructors. I’d say that is only a part of the truth: if you do take those special measures in your Java code, you’ll get the benefit of some specific guarantees made by the Java Memory Model that will ensure you can safely share your immutable objects between threads inside a JVM. But even without solid guarantees about immutability like the JVM can provide, using immutability can provide huge architectural gains. So while Git doesn’t have any means of enforcing strict immutability – I can just edit the contents of something stored under .git/objects, with probable chaos ensuing – utilising it opens up for a very powerful architecture.

When you’re writing multi-threaded Java programs, use the strong guarantees you can get from defining your Java classes correctly, but don’t be afraid to make immutability a core architectural idea even if there is no way to actually formally enforce it.

Micro Level: Building Blocks

I guess the clearest point I can make about immutability, like most of the others who have written about it, is still one about concurrent execution. But I am a believer also in the second point about the simplicity that immutability gives you even if your system isn’t distributed. For me, the essence of that point is that you can disregard any immutable object as a source of weirdness when you’re trying to figure out how the code in front of you is working. Once you have confirmed to yourself that an immutable object is being instantiated correctly, it is guaranteed to remain correct for the rest of the logical flow you’re looking at. You can filter out that object and focus your valuable brainpower on other things. As a former chess player, I see a strong analogy: once you’ve played a bit, you stop seeing three pawns, a rook and a king, you see a king-side castle formation. This is a single unit that you know how it interacts with the rest of the pieces on the board. This allows you to simplify your mental model of the board, reducing clutter and giving you a clearer picture of the essentials. A mutable object resists simplification, so you’ll constantly need to be alert to something that might change it and affect the logic.

On top of the simplicity that the classes themselves gain, by making them immutable, you can communicate your intent as a designer clearly: “this object is read-only, and if you believe you want to change it in order to make some modification to the functionality of this code, you’ve either not yet understood how the design fits together or requirements have changed to such an extent that the design doesn’t support them any longer”. Immutable objects make for extremely solid and easily understood building blocks and give the readers clear signals about their intended usage.

When immutability doesn’t cut the mustard

There are very few things that are all beneficial, and this is of course the case with immutability. So what are the problems you run into with immutability? Well, first of all, no system is interesting without mutable state, so you’ll need at least some objects that are mutable – in Git, branches are a prime example of mutable object, and hence also the one source of conflicts between different repositories. A branch is simply a pointer to the latest commit on that branch, and when you create new commits on a branch, the pointer is updated. In the diagram above, that is reflected by the ‘master’ label having moved to point to the latest commit, but the local copy of where ‘origin/master’ is pointing is unchanged and will remain so until the next push or fetch happens. In the mean time, the actual branch on the origin repository may have changed, so Git needs to handle possible conflicts there. So the branches suffer from the normal cache-related staleness and concurrent update problems, but thanks to most entities being immutable, these problems are confined to a small section of the system.

Another thing you’ll need, which is provided for free by a JVM but not in systems in general, is garbage collection. You’ll keep creating new objects instead of changing the existing ones, and for most kinds of systems, that means you’ll want to get rid of the outdated stuff. Most of the times, that stuff will simply sit on some disk somewhere, and disk space is cheap which means you don’t need to get super-sophisticated with your garbage collection algorithms. In general, the memory allocation and garbage collection overhead that you incur from creating new objects instead of updating existing ones is much less of a problem than it seems at first glance. I’ve personally run into one case where it was clearly impossible to make objects immutable: in Ardor3D, an open-source Java 3D engine, where the most commonly used class is Vector3, a vector of three doubles. This feels like it is a great candidate for being immutable since it is a typical small, simple value object. However, the main task of a 3D engine is to update many thousands of positions of things at least 50 times per second, so in this case, immutability would be completely impossible. Our solution there was to signal ‘immutability intent’ by having the Vector3 class (and some similar classes) implement a ReadOnlyVector3 interface and use that interface when some class needed a vector but wasn’t planning on modifying it.

There’s a lot more solid but semi-concrete advice about why immutability is useful outside of the context of sharing state between threads in Effective Java, so if you’re not convinced about it yet, I would urge you to re-read item 15. Or maybe the whole book, since everything Josh Bloch says is fantastic. To me, the use of immutability in Git shows that removing some aspects of freedom can unlock opportunities in other parts of the system. I also think that you should do everything you can do that will make your code easier for the reader to understand. Reducing mutability can have a game-changing impact on the macro level of your system, and it is guaranteed to help remove mental clutter and thereby improve your code on the micro level.

Code Concepts, Git, Java

1 Comment

if (readable) good();

Posted by Petter Måhlén in Java, Software Development on March 12, 2010

Last summer, I read the excellent book Clean Code by Robert Martin and was so impressed that I made it mandatory reading for the coders in my team. One of the techniques that he describes is that rather than writing code like:

  if (a != null && a.getLimit() > current) {
    // ...
  }

you should replace the boolean conditions with a method call and make sure the method name is descriptive of what you think should be true to execute the body of the if statement:

  if (withinLimit(a, current)) {
    // ...
  }

  private boolean withinLimit(A a, int current) {
    return a != null && a.getLimit() > current;
  }

I think that is great advice, because I used to always say that unclear logical conditions is one of the things that really requires code comments to make them understandable. The scenario where you have an if-statement with 3-4 boolean statements AND-ed and OR-ed together is

not uncommon, and
frequently a source of bugs, or something that needs changing, and
is something that stops you dead in your tracks when trying to puzzle out what the code in front of you does.

So it is important to make your conditions easily understandable. Rather than documenting what you want the condition to mean, simply creating a tiny method whose name is exactly what you want the condition to mean makes the flow of the code so much smoother. I also think (but I’m not sure) that it is easier to keep method names in synch with code updates than it is to keep comments up to date.

So that’s advice I have taken onboard when I write code. Yesterday, I found myself writing this:

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) {

    // ...

    if (shouldPerformTracking()) {
      executor.submit(new TrackingCallable(servletRequest, trackingService, ctrContext));
    }

    // ...
  }

  private boolean shouldPerformTracking() {
    // only do tracking if the controller has added a page node to the context
    return trackingDeterminator.performTracking() && hasPageNode(ctrContext);
  }

  private boolean hasPageNode(CtrContext ctrContext) {
    return !ctrContext.getEventNode().getChildren().isEmpty();
  }

As I was about to check that in, it suddenly struck me that even if the name of the method describing what I meant by the second logical condition was correct (hasPageNode()), it was weird that I had just written a comment explaining just that – what I meant by the logical condition, namely that the tracking should happen if tracking was enabled and the controller had added tracking information to the context. I figured out what the problem was and changed it to this:

  private boolean shouldPerformTracking() {
    return trackingDeterminator.performTracking() && controllerAddedPageInfo(ctrContext);
  }

  private boolean controllerAddedPageInfo(CtrContext ctrContext) {
    return !ctrContext.getEventNode().getChildren().isEmpty();
  }

That was a bit of an eye-opener somehow. The name of the condition-method should describe the intent, not the mechanics of the condition. So the first take on the method name (hasPageNode()) described what the condition checked in a more readable way, but it didn’t describe why that was a useful thing to check. If this type of method names explain the why of the condition, I strongly think they increase code readability and therefore represent a great technique.

As always, naming is hard and worth spending a little extra effort on if you want your code to be readable.

Code Concepts, Java

Leave a comment

Have I learned something about API Design?

Posted by Petter Måhlén in Java on March 3, 2010

Some years ago, somebody pointed me to Joshua Bloch’s excellent presentation about how to design APIs. If you haven’t watched it, go ahead and do that before reading on. To me, that presentation felt and still feels like it was full of those things where you can easily understand the theory but where the practice eludes you unless you – well, practice, I guess. Advice like “When in doubt, leave it out”, or “Keep APIs free of implementation details” is easy to understand as sentences, but harder to really understand in the sense of knowing how to apply it to a specific situation.

I’m right now working on making some code changes that were quite awkward, and awkward code is always an indication of a need to tweak the design a little. As a part of the awkwardness-triggered tweaking, I came up with an updated API that I was quite pleased with, and that made me think that maybe I’m starting to understand some of the points that were made in that presentation. I thought it would be interesting to revisit the presentation and see if I had assimilated some of his advice into actual work practice, so here’s my self-assessment of whether Måhlén understands Bloch.

First, the problem: the API I modified is used to generate the path part of URLs pointing to a specific category of pages on our web sites – Scorching, which means that a search for something to buy has been specific enough to land on a single product (not just any digital camera, but a Sony DSLR-A230L). The original API looks like this:

public interface ScorchingUrlHelper {
  @Deprecated
  String urlForSearchWithSort(Long categoryId, Product product, Integer sort);

  @Deprecated
  String urlForProductCompare(Long categoryId, Product product);

  @Deprecated
  String urlForProductCompare(String categoryAlias, Product product);

  @Deprecated
  String urlForProductCompare(Category category, Product product);

  @Deprecated
  String urlForProductCompare(Long categoryId, String title, Long id, String keyword);

  String urlForProductDetail(Long categoryId, Product product);
  String urlForProductDetail(String categoryAlias, Product product);
  String urlForProductDetail(Category category, Product product);
  String urlForProductDetail(Long categoryId, String title, Long id, String keyword);

  String urlForProductReview(Long categoryId, Product product);
  String urlForProductReview(String categoryAlias, Product product);
  String urlForProductReview(Category category, Product product);
  String urlForProductReview(Long categoryId, String title, Long id);

  String urlForProductReviewWrite(Long categoryId, String title, Long id);

  String urlForWINSReview(String categoryAlias, Product product);
  String urlForWINSReview(Category category, Product product);
  String urlForWINSReview(Long categoryId, String title, Long id, String keyword);
}

Just from looking at the code, it’s clear that the API has evolved over a period and has diverged a bit from its original use. There are some deprecated methods, and there are many ways of doing more or less the same things. For instance, you can get a URL part for a combination of product and category based on different kinds of information about the product or category.

The updated API – which feels really obvious in hindsight but actually took a fair amount of work – that I came up with is:

public interface ScorchingTargetHelper {
  Target targetForProductDetail(Product product, CtrPosition position);

  Target targetForProductReview(Product product, CtrPosition position);

  Target targetForProductReviewWrite(Product product, CtrPosition position);

  Target targetForWINSReview(Product product, CtrPosition position);
}

The reason I needed to change the code at all was to add information necessary for tracking click-through rates (CTR) to the generated URLs, so that’s why there is a new parameter in each method. Other than that, I’ve mostly removed things, which was precisely what I was hoping for.

Here are some of the rules from Josh’s presentation that I think I applied:

APIs should be easy to use and hard to misuse.

Since click tracking, once introduced, will be a core concern, it shouldn’t be possible to forget about adding it. Hence it was made a parameter in every method. People can probably still pass in garbage there, but at least everyone who wants to generate a scorching URL will have to take CTR into consideration.
Rather than @Deprecate-ing the deprecated methods, the new API is backwards-incompatible, making it far harder to misuse. @Deprecated doesn’t really mean anything, in my opinion. I like breaking things properly when I can, otherwise there’s a risk you don’t notice that something is broken.
All the information needed to generate one of these URLs is available if you have a properly initialised Product instance – it knows which Category it belongs to, and it knows about its title and id. So I removed the other parameters to make the API more concise and easier to use.

Names matter

The old class was called ScorchingUrlHelper, and each of its methods called urlForSomething(). But it didn’t create URLs, it created the path parts of URLs (no protocol, host or port parts – query parts are not needed/used for these URLs). The new version returns an (existing) internal representation of something for which a URL can be created called a Target, and the names of the class and methods reflect that.

When in doubt, leave it out

I removed a lot of ways to create Targets based on subsets of the data about products and categories. I’m not sure that that strictly means that I followed this piece of advice, it’s probably more a question of reducing code duplication and increasing API terseness. So instead of making the API implementation create Product instances based on three different kinds of inputs, that logic was extracted into a ProductBuilder, and something similar for Category objects. And I made use of the fact that a Product should already know which Category it belongs to.

Keep APIs free of implementation details

Another piece of advice that I don’t think I was following very successfully, but it wasn’t too bad. An intermediate version of the API took a CtrNode instead of a CtrPosition – the position in case is a tree hierarchy indicating where on the page the link appears, and a CtrNode object represents a node in the tree. But that is an implementation detail that shouldn’t really be exposed.

Provide programmatic access to all data available in string form

Rather than using a String as the return object, the Target was more suitable. This is really a fairly large part of the original awkwardness that triggered the refactoring.

Use the right data type for the job

I think I get points for this in two ways: using the Product object instead of various combinations of Longs and Strings, and for using the Target as a return type.

There’s a lot of Josh’s advice that I didn’t follow. Much of it relates to his recommendations on how to go about coming up with a good, solid version of the API. I was far less structured in how I went about it. A piece of advice that it could be argued that I went against is:

Don’t make the client do anything the library could do

I made the client responsible for instantiating the Product rather than allowing multiple similar calls taking different parameters. In this case, I think that avoiding duplication was the more important thing, and that maybe the library couldn’t/shouldn’t do that for the client. Or perhaps, “the library” shouldn’t be interpreted as just “the class in question”. I did add a new class to the library that helps clients instantiate their Products based on the same information as before.
I made the client go from Target to String to actually get at the path part of the URL. This was more of a special case – the old style ScorchingUrlHelper classes were actually instantiated per request, while I wanted something that could be potentially Singleton scoped. This meant that either I had to add request-specific information as a method parameter in the API (the current subdomain), or change to a Target as the return type and do the rest of the URL generation outside. It felt cleaner to leave that outside, leaving the ScorchingTargetHelper a more focused class with fewer responsibilities and collaborators.

So, in conclusion: do I think that I have actually understood the presentation on a deeper level than just the sentences? Well, when I went through the presentation to write this post, I was actually pleasantly surprised at the number of bullet points that I think I followed. I’m still not convinced, though. I think I still have a lot of things to learn, especially in terms of coming up with concise and flexible APIs that are right for a particular purpose. But at least, I think the new API is an improvement over the old one. And what’s more, writing this post by going back to that presentation and thinking about it in direct relationship to something I just did was a useful exercise. Maybe I learned a little more about API design by doing that!

Code Concepts, Java

Leave a comment

Petter Måhlén's Blog

Archive for category Java

Keeping Classes Simple using the SRP

Fluent APIs in Java

Bygg – Executing the Build

Bygg – Better Dependency Management

Declaring and Defining Dependencies

Better Transitive Conflict Resolution

Build Java with Java?!

Bygg – Ideas for a Better Maven

Build Java with Java

Interlude: Aspects of Build Configurations

Non-linear Build Execution

Gradual Buildup of DI Scope

Core Build Components Provided

Naming the Tool

Composition vs Inheritance

Immutability: a constraint that empowers

Macro Level: Distributed Systems

Micro Level: Building Blocks

When immutability doesn’t cut the mustard

if (readable) good();

Have I learned something about API Design?

APIs should be easy to use and hard to misuse.

Names matter

When in doubt, leave it out

Keep APIs free of implementation details

Provide programmatic access to all data available in string form

Use the right data type for the job

Don’t make the client do anything the library could do

Archives

Blogs I read

Friends that blog

Tag Cloud

Email Subscription

Admin

Google profile

Archive for category Java

Rate this:

Rate this:

Rate this:

Declaring and Defining Dependencies

Better Transitive Conflict Resolution

Build Java with Java?!

Rate this:

Build Java with Java

Interlude: Aspects of Build Configurations

Non-linear Build Execution

Gradual Buildup of DI Scope

Core Build Components Provided

Naming the Tool

Rate this:

Rate this:

Macro Level: Distributed Systems

Micro Level: Building Blocks

When immutability doesn’t cut the mustard

Rate this:

Rate this:

APIs should be easy to use and hard to misuse.

Names matter

When in doubt, leave it out

Keep APIs free of implementation details

Provide programmatic access to all data available in string form

Use the right data type for the job

Don’t make the client do anything the library could do

Rate this:

Archives

Blogs I read

Friends that blog

Tag Cloud

Email Subscription

Admin

Google profile