Bygg – Better Dependency Management | Petter Måhlén's Blog

Bygg – Better Dependency Management

I’ve said before that the one thing that Maven does amazingly well is dependency management. This post is about how to do it better.

Declaring and Defining Dependencies

In Maven, you can declare dependencies using a <dependencyManagement /> section in either the current POM or some parent POM, and then use them – include them into the current build – using a <dependencies /> section. This is very useful because it allows you to define in a single place which version of some dependency should be used for a set of related modules. It looks something like:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.shopzilla.site2.service</groupId>
      <artifactId>common</artifactId>
      <version>${service.common.version}</version>
    </dependency>
    <dependency>
      <groupId>com.shopzilla.site2.service</groupId>
      <artifactId>common-web</artifactId>
      <version>${service.common.version}</version>
    </dependency>
   <dependency>
       <groupId>com.shopzilla.site2.core</groupId>
       <artifactId>core</artifactId>
       <version>${core.version}</version>
     </dependency>
   </dependencies>
 </dependencyManagement>

  <dependencies>
   <dependency>
     <groupId>com.shopzilla.site2.service</groupId>
     <artifactId>common</artifactId>
   </dependency>
   <dependency>
     <groupId>com.shopzilla.site2.core</groupId>
     <artifactId>core</artifactId>
   </dependency>
 </dependencies>

To me, there are a couple of problems here:

Declaring a dependency looks pretty much identical to using one – the only difference is that the declaration is enclosed inside a <dependencyManagement /> section. This makes it hard to know what it is you’re looking at if you have a large number of dependencies – is this declaring a dependency or actually using it?
It’s perfectly legal to add a <version/> tag in the plain <dependencies /> section, which will happen unless all the developers touching the POM a) understand the distinction between the two <dependencies /> sections, and b) are disciplined enough to maintain it.

For large POMs and POM hierarchies in particular, the way to define shared dependencies becomes not only overly verbose but also hard to keep track of. I think it could be made much easier and nicer in Bygg, something along these lines:


// Core API defined in Bygg proper
public interface Artifact {
   String getGroupId();
   String getArtifactId();
}

// ----------------------------------------------

// This enum is in some shared code somewhere - note that the pattern of declaring artifacts
// using an enum could be used within a module as well if desired. You could use different enums
// to define, for instance, sets of artifacts to use when talking to databases, or when developing
// a web user interface, or whatever.
public enum MyArtifacts implements Artifact {
    SERVICE_COMMON("com.shopzilla.site2.service", "common"),
    CORE("com.shopzilla.site2.core", "core");
    JUNIT("junit", "junit");
}

// ----------------------------------------------

// This can be declared in some shared code as well, probably but not
// necessarily in the same place as the enum. Note that the type of
// the collection is Collection, indicating that it isn't ordered.
public static final Collection ARTIFACT_VERSIONS = ImmutableList.of(
        new ArtifactVersion(SERVICE_COMMON, properties.get("service.common.version")),
        new ArtifactVersion(CORE, properties.get("core.version")),
        new ArtifactVersion(JUNIT, "4.8.1"));

// ----------------------------------------------

// In the module to be built, define how the declared dependencies are used.
// Here, ordering might be significant (it indicates the order of artifacts on the
// classpath - the reason for the 'might' is I'm not sure if this ordering can carry
// over into a packaged artifact like a WAR).
List moduleDependencies = ImmutableList.of(
    new Dependency(SERVICE_COMMON, EnumSet.of(MAIN, PACKAGE)),
    new Dependency(CORE, EnumSet.of(MAIN, PACKAGE)),
    new Dependency(JUNIT, EnumSet.of(TEST)));

The combination of a Collection of ArtifactVersions and a List of Dependency:s is then used by the classpath assembler target to produce an actual classpath for use in compiling, running, etc. Although the example code shows the Dependency:s as a plain List, I kind of think that there may be value in not having the actual dependencies be something else. Wrapping the list in an intelligent object that gives you filtering options, etc., could possibly be useful, but it’s a bit premature to decide about that until there’s code that makes it concrete.

The main ideas in the example above are:

Using enums for the artifact identifiers (groupId + artifactId) gives a more succinct and harder-to-misspell way to refer to artifacts in the rest of the build configuration. Since you’re editing this code in the IDE, finding out exactly what the artifact identifier means (groupId + artifactId) is as easy as clicking on it while pressing the right key.
If the build configuration is done using regular Java code, shared configuration items can trivially be made available as Maven artifacts. That makes it easy to for instance have different predefined groups of related artifacts, and opens up for composed rather than inherited shared configurations. Very nice!
In the last bit, where the artifacts are actually used in the build, there is an outline of something I think might be useful. I’ve used Maven heavily for years, and something I’ve never quite learned how they work is the scopes. It’s basically a list of strings (compile, test, provided, runtime) that describe where a certain artifact should be used. So ‘compile’ means that the artifact in question will be used when compiling the main source code, when running tests, when executing the artifact being built and (if it is a WAR), when creating the final package. I think it would be far simpler to have a set of flags indicating which classpaths the artifact should be included in. So MAIN means ‘when compiling and running the main source’, TEST is ditto for the test source, and PACKAGE is ‘include in the final package’, and so on. No need to memorise what some scope means, you can just look at the set of flags.

Another idea that I think would be useful is adding an optional Repository setting for an Artifact. With Maven, you can add repositories in addition to the default one (Maven Central at Ibiblio). You can have as many as you like, which is great for some artifacts that aren’t included in Maven Central. However, adding repositories means slowing down your build by a wide margin, as Maven will check each repository defined in the build for updates to each snapshot version of an artifact. Whenever I add a repository, I do that to get access to a specific artifact. Utilising that fact by having two kinds of repositories – global and artifact-specific, maybe – should be simple and represent a performance improvement.

Better Transitive Conflict Resolution

Maven allows you to not have to worry about transitive dependencies required by libraries you include. This is, as I’ve argued before, an incredibly powerful feature. But the thing is, sometimes you do need to worry about those transitive dependencies: when they introduce binary incompatibilities. A typical example is something I ran into the other day, where a top-level project used version 3.0.3 of some Spring jars (such as ‘spring-web’), while some shared libraries eventually included another version of Spring (the ‘spring’ artifact, version 2.5.6). Both of these jars contain a definition of org.springframework.web.context.ConfigurableWebApplicationContext, and they are incompatible. This leads to runtime problems (in other words, the problem is visible too late; it should be detected at build time), and the only way to figure that out is to recognise the symptoms of the problem as a “likely binary version mismatch”, then use mvn dependency:analyze to figure out possible candidates and add exclude rules like this to your POM:

<dependencyManagement>
  <dependencies>
    <dependency>
       <groupId>com.shopzilla.site2.service</groupId>
       <artifactId>common</artifactId>
       <version>${service.common.version}</version>
       <exclusions>
          <exclusion>
            <groupId>org.springframework</groupId>
            <artifactId>spring</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.google.collections</groupId>
            <artifactId>google-collections</artifactId>
          </exclusion>
        </exclusions>
      </dependency>
      <dependency>
        <groupId>com.shopzilla.site2.core</groupId>
        <artifactId>core</artifactId>
        <version>${core.version}</version>
        <exclusions>
          <exclusion>
            <groupId>org.springframework</groupId>
            <artifactId>spring</artifactId>
          </exclusion>
          <exclusion>
            <groupId>com.google.collections</groupId>
            <artifactId>google-collections</artifactId>
          </exclusion>
        </exclusions>
      </dependency>
  </dependencies>
</dependencyManagement>

As you can tell from the example (just a small part of the POM) that I pasted in, I had a similar problem with google-collections. The top level project uses Guava, so binary incompatible versions included by the dependencies needed to be excluded. The problems here are:

It’s painful to figure out what libraries cause the conflicts – sometimes, you know or can easily guess (like the fact that different versions of Spring packages can clash), but other times you need to know something a little less obvious (like the fact that Guava has superseded google-collections, something not immediately clear from the names). The tool could just tell you that you have binary incompatibilities on your classpath (I actually submitted a patch to the Maven dependency plugin to fix that, but it’s been stalled for 6 months).
Once you’ve figured out what causes the problem, it’s a pain to get rid of all the places it comes from. The main tool at hand is the dependency plugin, and the way to figure out where dependencies come from is mvn dependency:tree. This lets you know a single source of a particular dependency. So for me, I wanted to find out where the spring jar came from – that meant running mvn dependency:tree, adding an exclude, running it again to find where else the spring jar was included, adding another exclude, and so on. This could be so much easier. And since it could be easier, it should be.
What’s more, the problems are sometimes environment-dependent, so you’re not guaranteed that they will show up on your development machine. I’m not sure about the exact reasons, but I believe that there are differences in the order in which different class loaders load classes in a WAR. This might mean that the only place you can test if a particular problem is solved or not is your CI server, or some other environment, which again adds pain to the process.
The configuration is rather verbose and you need to introduce duplicates, which makes your build files harder to understand at a glance.

Apart from considering binary incompatibilities to be errors (and reporting on exactly where they are found), here’s how I think exclusions should work in Bygg:


 dependencies.exclude().group("com.google.collections").artifact("google-collections")
          .exclude().group("org.springframework").artifact("spring.*").version(except("3.0.3"));

Key points above are:

Making excludes a global thing, not a per-dependency thing. As soon as I’ve identified that I don’t want spring.jar version 2.5.6 in my project, I know I don’t want it from anywhere at all. I don’t care where it comes from, I just don’t want it there! I suppose there is a case for saying “I trust the core library to include google-collections for me, but not the common-web one”, so maybe global excludes aren’t enough. But they would certainly have helped me tremendously a lot of the times I’ve had to use Maven exclusions, and I can’t think of a case where I’ve actually wanted specifically to have an artifact-specific exclusion.
Defining exlusion filters using a fluent API that includes regular expressions. With Spring in particular, you want to make sure that all your jars have the same version. It would be great to be able to say that you don’t want anything other than that.

Build Java with Java?!

I’ve gone through different phases when thinking about using Java to configure builds rather than XML. First, I thought “it’s great, because it allows you to more easily use the standard debugger for the builds and thereby resolve Maven’s huge documentation problem”. But then I realised that the debugging is enabled by ensuring that the IDE has access to the source code of the build tool and plugins that execute the build, and that how you configure it is irrelevant. So then I thought that using Java to configure is pretty good anyway, because it means developers won’t need to learn a new language (as with Buildr or Raven), and that IDE integration is a lot easier. The IDE you use for your regular Java programming wouldn’t need to be taught anything specific to deal with some more Java code. I’ve now come to the conclusion that DSL-style configuration APIs, and even more, using the standard engineering principles for sharing and reusing code for build configurations is another powerful argument in favour of using Java in the build configuration. So I’ve gone from “Java configuration is key”, to “Java configuration is OK, but not important” to “Java configuration is powerful”.

Builds, Bygg, Java, Maven

This entry was posted on February 17, 2011, 09:13 and is filed under Java. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

Petter Måhlén's Blog