Archive for August, 2010

Composition vs Inheritance

One of the questions I often ask when interviewing candidate programmers is “what is the difference between inheritance and composition”. Some people don’t know, and they’re obviously out right then and there. Most people explain that “inheritance represents an is-a relationship and composition a has-a”, which is correct but insufficient. Very few people have a good answer to the question that I’m really asking: what impact does the choice of one or the other have on your software design? The last few years it seems there has been a trend towards feeling that inheritance is over-used and composition to a corresponding degree under-used. I agree, and will try to show some concrete reasons why that is the case. (As a side note, the main trigger for this blog post is that about a year ago, I introduced a class hierarchy as a part of a refactoring story, and it was immediately obvious that it was an awkward solution. However, it (and the other changes) represented a big improvement over what went before, and there wasn’t time to fix it back then. It’s bugged me ever since, and thanks to our gradual refactoring, we’re getting closer to the time when we can put something better in place, so I started thinking about it again.)

One guy I interviewed the other day had a pretty good attempt at describing when one should choose one or the other: it depends on which is closest to the real world situation that you’re trying to model. That has the ring of truth to it, but I’m not sure it is enough. As an example, take the classically hierarchical model of the animal kingdom: Humans and Chimpanzees are both Primates, and, skipping a bunch of steps, they are Animals just like Kangaroos and Tigers. This is clearly a set of is-a relationships, so we could model it like so:


public class Animal {}

public class Primate extends Animal {}

public class Human extends Primate {}

public class Chimpanzee extends Primate {}

public class Kangaroo extends Animal {}

public class Tiger extends Animal {}

Now, let’s add some information about their various body types and how they walk. Both Humans and Chimpanzees have two arms and two legs, and Tigers have four legs. So we could add two legs and two arms at the Primate level, and four legs at the Tiger level. But then it turns out that Kangaroos also have two legs and two arms, so this solution doesn’t work without duplication. Perhaps a more general concept involving limbs is needed at the animal level? There would still be duplication of the ‘legs + arms’ classification in both Kangaroo and Primate, not to mention what would happen if we introduce a RattleSnake into our Eden (or an EarthWorm if you prefer an animal that has never had limbs at any stage of its evolutionary history).

A similar problem is shown by introducing a walk() method. Humans do a bipedal walk, Chimpanzees and Tigers walk on all fours, and Kangaroos hop using their rear feet and tail. Where do we locate the logic for movement? All the animals in our sample hierarchy can walk on all fours (although humans tend to do so only when young or inebriated), so having a quadrupedWalk() method at the Animal level that is called by the walk() methods of Tiger and Chimpanzee makes sense. At least until we add a Chicken or Halibut, because they will now have access to a type of locomotion they shouldn’t.

The root problem here is of course that the model is bad even though it seems like a natural one. If you think about it, body types and locomotion are not defined by the various creatures’ positions in the taxonomy – there are both mammals and fish that swim, and both fish, lizards and snakes can be legless. I think it is natural for us to think in terms of templates and inherited properties from a more general class (Douglas Hofstadter said something very similar, I think in Gödel, Escher, Bach). If that kind of thinking is a human tendency, that could explain why it seems so common to initially think that a hierarchical model is the best one. It is often a mistake to assume that the presence of an is-a relationship in the real world means that inheritance is right for your object model.

But even if, unlike the example above, a hierarchical model  is a suitable one to describe a certain real-world scenario, there are issues. A big one, in my opinion, is that the classes that make up a hierarchical model are very tightly coupled. You cannot instantiate a class without also instantiating all of its superclasses. This makes them harder to test – you can never  ‘mock’ a superclass, so each test of a leaf in the hierarchy requires you to do all the setup for all the classes above. In the example above, you might have to set up the limbs of your RattleSnake and the data needed for the quadrupedWalk() method of the Halibut before you could test them.

With inheritance, the smallest testable unit is larger than it is when composition is used and you increase code duplication in your tests at least. In this way, inheritance is similar to use of static methods – it removes the option to inject different behaviour when testing. The more code you try to reuse from the superclass/es, the more redundant test code you will typically get, and all that redundancy will slow you down when you modify the code in the future. Or, worse, it will make you not test your code at all because it’s such a pain.

There is an alternative model, where each of the beings above have-a BodyType and possibly also have-a Locomotion that knows how to move that body. Probably, each of the animals could be differently configured instances of the same Animal class.


public interface BodyType {}

public TwoArmsTwoLegs implements BodyType {}

public FourLegs implements BodyType {}

public interface Locomotion<B extends BodyType> {
  void walk(B body);
}

public class BipedWalk implements Locomotion<TwoArmsTwoLegs> {
  public void walk(TwoArmsTwoLegs body) {}
}

public class Slither implements Locomotion<NoLimbs> {
  public void walk(NoLimbs body) {}
}

public class Animal {
   BodyType body;
   Locomotion locomotion;
}

Animal human = new Animal(new TwoArmsTwoLegs(), new BipedWalk());

This way, you can very easily test the bodies, the locomotions, and the animals in isolation of each other. You may or may not want to do some automated functional testing of the way that you wired up your Human to ensure that it behaves like you want it to. I think composition is almost universally preferable to inheritance for code reuse – I’ve tried to think of cases where the only natural model is a hierarchical one and you can’t use composition instead, but it seems to be beyond me.

In addition to the arguments here, there’s the well-known fact that inheritance breaks encapsulation by exposing subclasses to implementation details in the superclass. This makes the use of subclassing as a means to integrate with a library a fragile solution, or at least one that limits the future evolution of the library.

So, there it is – that summarises what I think are the differences between composition and inheritance, and why one should tend to prefer composition over inheritance. So, if you’re reading this because you googled me before an interview,  this is one answer you should get right!

Advertisements

, , ,

12 Comments

Finding Duplicate Class Definitions Using Maven

If you have a largish set of internal libraries with a complex dependency graph, chances are you’ll be including different versions of the same class via different paths. The exact version of the class that gets loaded seems to depend on the combination of JVM, class loader and operating system that happens to be used at the time. This can cause builds to fail on some systems but not others and is quite annoying. When this has been happening to me, it’s usually been for one of two reasons:

  1. We’ve been restructuring our internal artifacts, and something was moved from artifact A to B, only the project in question is still on a version of artifact A that is “pre-removal”. This often leads to binary incompatibilities if the class has evolved since being moved to artifact B.
  2. Two artifacts in the dependency graph have dependencies on artifacts that, while actually different as artifacts, contain class files for the same class. This can typically happen with libraries that provide jar distributions that include all dependencies, or where there are distributions that are partial or full.

On a couple of previous occasions, when trying to figure out how duplicate class definitions made it into projects I’ve been working on, I’ve gone through a laborious manual process to list class names defined in jars, and see which ones are repeated in more than one. I thought that a better option might be to see if that functionality could be added into the Maven dependency plugin.

My original idea was to add a new goal, something like ‘dependency:duplicate-classes’, but when looking a little more closely at the source code of the dependency plugin, I found that the dependency:analyze goal had all the information needed to figure out which classes are defined more than once. So I decided to make a version of the maven-dependency-plugin where it is possible to detect duplicate class definitions using ‘mvn dependency:analyze’.

The easiest way to run the updated plugin is like this:

mvn dependency:analyze -DcheckDuplicateClasses

The output if duplicate classes are found is something like:

[WARNING] Duplicate class definitions found:
[WARNING]    com.shopzilla.common.data.ObjectFactory defined in:
[WARNING]       com.shopzilla.site.url.c14n:model:jar:1.4:compile
[WARNING]       com.shopzilla.common.data:data-model-schema:jar:1.23:compile
[WARNING]    com.shopzilla.site.category.CategoryProvider defined in:
[WARNING]       com.shopzilla.site2.sasClient:sas-client-core:jar:5.47:compile
[WARNING]       com.shopzilla.site2.service:common-web:jar:5.50:compile

If you would like to try the updated plugin on your project, here’s how to do it:

  1. Get the forked code for the dependency analyzer goal from http://github.com/pettermahlen/maven-dependency-analyzer-fork and install it in your local Maven repo by running ‘mvn install’. (It appears that for some people, the unit tests fail during this process – I’ve not been able to reproduce this, and it’s not the tests that I wrote, so in this case my recommendation would be to simply use -DskipTests=true to ignore them).
  2. Get the forked code for the dependency plugin from http://github.com/pettermahlen/maven-dependency-plugin-fork and install it in your local Maven repo by running ‘mvn install’.
  3. Update your pom.xml file to use the forked version of the dependency plugin (it’s probably also possible to use the plugin registry, but I’ve not tested that):
<build>
  <pluginManagement>
    <plugins>
      <plugin>
        <artifactId>maven-dependency-plugin</artifactId>
        <version>2.2.PM-SNAPSHOT</version>
      </plugin>
    </plugins>
  </pluginManagement>
</build>

I’ve filed a JIRA ticket to get this feature included into the dependency plugin – if you think it would be useful, it might be a good idea to vote for it. Also, if you have any feedback about the feature, feel free to comment here!

,

2 Comments