Composition vs Inheritance

One of the questions I often ask when interviewing candidate programmers is “what is the difference between inheritance and composition”. Some people don’t know, and they’re obviously out right then and there. Most people explain that “inheritance represents an is-a relationship and composition a has-a”, which is correct but insufficient. Very few people have a good answer to the question that I’m really asking: what impact does the choice of one or the other have on your software design? The last few years it seems there has been a trend towards feeling that inheritance is over-used and composition to a corresponding degree under-used. I agree, and will try to show some concrete reasons why that is the case. (As a side note, the main trigger for this blog post is that about a year ago, I introduced a class hierarchy as a part of a refactoring story, and it was immediately obvious that it was an awkward solution. However, it (and the other changes) represented a big improvement over what went before, and there wasn’t time to fix it back then. It’s bugged me ever since, and thanks to our gradual refactoring, we’re getting closer to the time when we can put something better in place, so I started thinking about it again.)

One guy I interviewed the other day had a pretty good attempt at describing when one should choose one or the other: it depends on which is closest to the real world situation that you’re trying to model. That has the ring of truth to it, but I’m not sure it is enough. As an example, take the classically hierarchical model of the animal kingdom: Humans and Chimpanzees are both Primates, and, skipping a bunch of steps, they are Animals just like Kangaroos and Tigers. This is clearly a set of is-a relationships, so we could model it like so:


public class Animal {}

public class Primate extends Animal {}

public class Human extends Primate {}

public class Chimpanzee extends Primate {}

public class Kangaroo extends Animal {}

public class Tiger extends Animal {}

Now, let’s add some information about their various body types and how they walk. Both Humans and Chimpanzees have two arms and two legs, and Tigers have four legs. So we could add two legs and two arms at the Primate level, and four legs at the Tiger level. But then it turns out that Kangaroos also have two legs and two arms, so this solution doesn’t work without duplication. Perhaps a more general concept involving limbs is needed at the animal level? There would still be duplication of the ‘legs + arms’ classification in both Kangaroo and Primate, not to mention what would happen if we introduce a RattleSnake into our Eden (or an EarthWorm if you prefer an animal that has never had limbs at any stage of its evolutionary history).

A similar problem is shown by introducing a walk() method. Humans do a bipedal walk, Chimpanzees and Tigers walk on all fours, and Kangaroos hop using their rear feet and tail. Where do we locate the logic for movement? All the animals in our sample hierarchy can walk on all fours (although humans tend to do so only when young or inebriated), so having a quadrupedWalk() method at the Animal level that is called by the walk() methods of Tiger and Chimpanzee makes sense. At least until we add a Chicken or Halibut, because they will now have access to a type of locomotion they shouldn’t.

The root problem here is of course that the model is bad even though it seems like a natural one. If you think about it, body types and locomotion are not defined by the various creatures’ positions in the taxonomy – there are both mammals and fish that swim, and both fish, lizards and snakes can be legless. I think it is natural for us to think in terms of templates and inherited properties from a more general class (Douglas Hofstadter said something very similar, I think in Gödel, Escher, Bach). If that kind of thinking is a human tendency, that could explain why it seems so common to initially think that a hierarchical model is the best one. It is often a mistake to assume that the presence of an is-a relationship in the real world means that inheritance is right for your object model.

But even if, unlike the example above, a hierarchical model is a suitable one to describe a certain real-world scenario, there are issues. A big one, in my opinion, is that the classes that make up a hierarchical model are very tightly coupled. You cannot instantiate a class without also instantiating all of its superclasses. This makes them harder to test – you can never ‘mock’ a superclass, so each test of a leaf in the hierarchy requires you to do all the setup for all the classes above. In the example above, you might have to set up the limbs of your RattleSnake and the data needed for the quadrupedWalk() method of the Halibut before you could test them.

With inheritance, the smallest testable unit is larger than it is when composition is used and you increase code duplication in your tests at least. In this way, inheritance is similar to use of static methods – it removes the option to inject different behaviour when testing. The more code you try to reuse from the superclass/es, the more redundant test code you will typically get, and all that redundancy will slow you down when you modify the code in the future. Or, worse, it will make you not test your code at all because it’s such a pain.

There is an alternative model, where each of the beings above have-a BodyType and possibly also have-a Locomotion that knows how to move that body. Probably, each of the animals could be differently configured instances of the same Animal class.


public interface BodyType {}

public TwoArmsTwoLegs implements BodyType {}

public FourLegs implements BodyType {}

public interface Locomotion<B extends BodyType> {
  void walk(B body);
}

public class BipedWalk implements Locomotion<TwoArmsTwoLegs> {
  public void walk(TwoArmsTwoLegs body) {}
}

public class Slither implements Locomotion<NoLimbs> {
  public void walk(NoLimbs body) {}
}

public class Animal {
   BodyType body;
   Locomotion locomotion;
}

Animal human = new Animal(new TwoArmsTwoLegs(), new BipedWalk());

This way, you can very easily test the bodies, the locomotions, and the animals in isolation of each other. You may or may not want to do some automated functional testing of the way that you wired up your Human to ensure that it behaves like you want it to. I think composition is almost universally preferable to inheritance for code reuse – I’ve tried to think of cases where the only natural model is a hierarchical one and you can’t use composition instead, but it seems to be beyond me.

In addition to the arguments here, there’s the well-known fact that inheritance breaks encapsulation by exposing subclasses to implementation details in the superclass. This makes the use of subclassing as a means to integrate with a library a fragile solution, or at least one that limits the future evolution of the library.

So, there it is – that summarises what I think are the differences between composition and inheritance, and why one should tend to prefer composition over inheritance. So, if you’re reading this because you googled me before an interview, this is one answer you should get right!

Code Concepts, Java, Refactoring, Unit Testing

This entry was posted on August 20, 2010, 06:46 and is filed under Java. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

#1 by Zbigniew Lukasiak on September 23, 2010 - 14:05

I don’t have an interview with you – just googled for inheritance versus composition – but this is interesting. I’ve heard similar advice for a long time – but programmers still use inheritance very extensively, one explanation to that could be the mentioned compatibility with how our brain works, but I think it is also the dynamic process of development that is at play here. It is easier to get something working if you can take something similar and tweak it then when you are limited to building something new from ready made parts. As Agile arguments – this is also economic way to start with code working but possibly ugly structured and refactor it later.

#2 by Smahlatz on July 27, 2012 - 13:19

hmm – you say “One guy I interviewed the other day had a pretty good attempt at describing when one should choose one or the other: it depends on which is closest to the real world situation that you’re trying to model.”
Then give an example which is not based on a real world situation, which you state yourself – “The root problem here is of course that the model is bad”.

I generally agree with your view re Inheritance V Composition, but I think this particular example is a bad one,

- #3 by Petter Måhlén on July 28, 2012 - 15:49
  
  It looks like maybe a part of your comment didn’t make it – the point of the example was to show how easy it is to incorrectly use a model with inheritance using a concept I think is universally known. If you have some more ideas about how the example could be improved or why it is bad, go ahead and post them!
  
#4 by Manne Fagerlind on January 4, 2013 - 09:23

Funny, I used this very example in a presentation the other day. I think it’s a good way to demonstrate that inheritance is over-used and composition is generally more useful.

I’d say inheritance doesn’t really exist in the animal kingdom; it’s just a theoretical construct. We are what we are because of our constituent parts (molecules, cells, body parts), not because we have been categorized as mammals or primates. The same goes for most other concepts, especially physical objects.

- #5 by Petter Måhlén on January 4, 2013 - 10:45
  
  I’m not sure I’d go so far as to say that inheritance doesn’t exist in the animal kingdom (the ghost of Linné might come to haunt me if I did!). I was trying to make a slight variation of that point: even though inheritance does exist, it’s not necessarily the best perspective to have when you are creating a domain model for some specific problem. Humans and Chimpanzees are both Primates because we share the same ancestors. So there is an is-a relationship. But that relationship doesn’t have to be expressed through classes inheriting from each other, and I think doing so is usually a bad idea. If you do need that relationship in the code, maybe the best choice is a simple tree of objects rather than a hierarchy of classes.
  
  In the animal kingdom, parallel evolution means you cannot say that common traits (like a bipedal walk, or having eyes) are necessarily inherited. They can appear spontaneously on separate branches/nodes of the inheritance tree. The same is often true of other domains – you may want both your button and your text to be clickable, but I think it generally makes the code awkward if you add the ‘clickable’ behaviour into a parent class of both of those (especially if, say, you don’t want read-only text to be clickable, or something). And there is much less strength in the argument that buttons and texts are both ‘ClickableWidgets’ than that humans and chimps are primates. The ‘ClickableWidget’ parent has probably been added because the mistake of using a class hierarchy had already been made.
  
  - #6 by Manne Fagerlind on January 4, 2013 - 11:06
    
    Yes, we have common ancestors, but what defines us as individuals isn’t the history of our species (inheritance) but the properties of our bodies (composition). As you rightly point out, the relationship between biological history and common traits is very shaky. The inheritance concept is based on Aristotelian logic and really too simplistic when applied to most real-world problems. In the area of animals, it is only relevant as a historical concept.
    
    I think we agree about the main thing: favouring composition over inheritance. It will do all developers a world of good.
  - #7 by Petter Måhlén on January 4, 2013 - 11:29
    
    Yes, we do agree. :)
#8 by Tom on July 28, 2013 - 22:23

What about the fact that all Mammals have “hair” and, if female, they all have mammary glands. So now, without inheritance, you have to create a “MammalProperties” object and add that to each implementing mammal (Dog, Cat, etc) vs just having Dog, Cat extend Mammal. I get the flexibility in Composition and why in principle you should always favor composition, but I wouldn’t say it always simplifies things. I’d rather call dog.getHairType() vs dog.getMammalProperties().getHairType() and now you’ve just made one more class for users of your API to have to contend with vs seeing the properties directly in your abstract Mammal class.

Even if you didn’t create a Mammal class to extend from, you probably will end up with at least a Mammal interface anyway since you may want your API to enforce adding only Mammals to a “Cage” or something like that. So now you end up with more complexity… you have a Mammal interface but the properties of a mammal are all in some other object.

In your above the composition of BodyType and Locomotion is fine on Animal, but I’d see nothing wrong with Dog extends Mammal, and Mammal extends Animal. I don’t want to have to add all the properties of a Animal (Locomotion and BodyType) to EVERY single type of animal implementation object I create. I see nothing wrong with a combination of both inheritance and composition.

- #9 by Petter Måhlén on August 3, 2013 - 10:51
  
  You’re right. Sometimes, inheritance is the best choice. The points I was trying to make are that it is often over-used, and that a lot of the issues with how inheritance reduces flexibility in your application are subtle. Perhaps inheritance is the best choice initially, but what if, say, a reptile would evolve that has hair? Maybe not so likely in our lifetime in the animal kingdom, but businesses do evolve quickly.
  
  For maximum flexibility, I like your comment about interfaces: Cageable, Hairy, etc. Exactly how that interface is implemented could be an implementation detail.
  
#10 by manoj on July 29, 2013 - 06:57

To have more clarity on Why composition should be favored over inheritance, Please read http://efectivejava.blogspot.in/2013/07/item-16-favor-composition-ovr.html

#11 by ed hastings on December 16, 2013 - 18:41

I’ve preferred composition over inheritance since reading the first chapter of the GoF book many years ago, which makes a very strong case for it. However, both are useful tools and neither should ignored completely in favor of the other. Using the wrong tool for the job is always sub-optimal; too much composition can be just as bad as too much inheritance.

Also, framing things as a “vs.” distracts from the fact that inheritance and composition can be combined for very powerful solutions. Several common design patterns take advantage of this truth, as a for instance.

A couple of points:

1) Inheritance doesn’t break encapsulation intrinsically (at least in languages with scoping); when it does it is either a design flaw (the base class is improperly defined, exposing data that should not be exposed) or an implementation flaw (a subclass exposes something that it should not).

2) I know that it is the classical thinking on the subject, but examples of inheritance that get into modeling real world things as “proof” that inheritance has problems are generally flawed from the get go.

For instance, the classic animal model of inheritance is often trotted out as an example of why inheritance is bad-wrong, but it dodges the fact that modeling with high granularity is going against the grain of modeling to begin with.

Modeling is generally best suited to defining a useful abstraction and / or formal system and then working within it…as soon as details and special cases crop up that break the abstraction you start playing whack-a-mole trying to encompass inconvenient reality within your model, or start poking holes and thus making your abstraction “leaky”, or degenerate / erode the abstraction altogether with special cases / exceptions.

LSP / square-rectangle issues start cropping up, as do chimerical special cases (i.e. platypus? hrm…) that have traits that cross-cut the inheritance hierarchy.

As soon as the usefulness of the original abstraction is invalidated, you have a fundamental failure of the inheritance hierarchy to serve its original purpose; i.e. you’ve got a busted model. You mention GEB (one of my favorite books, as it happens); the examples of made up formal systems echo this to some extent; re: finding an axiomatic failure within your model.

Where I’ve found inheritance to work best in my own work is to not focus on data / traits and to instead focus on behavior and flow / sequence to define base classes that express common behavior, including data / state only as necessary (and preferably abstractly or generically) to support that behavior. If something superficially seems like it would be a subtype in the model but behaves fundamentally different than the baser type, then it is not a valid subtype and trying to force it to be opens the door to unexpected side effects.

The bottom line is, don’t try to _force_ inheritance. An abstraction is either useful or not useful to solve a problem in an elegant / “clean” way. If you find that you have to do odd things to make inheritance “work” in a given situation or deform a concept to kind-of-sort-of awkwardly fit within an abstraction…that particular inheritance model has moved out of the category of “part of the solution” and into the category or “part of the problem”.

All comments above are prefaced with “in my opinion” and closed with “your mileage might vary”.

- #12 by Petter Måhlén on December 16, 2013 - 20:29
  
  Great points, I agree. It’s been a few years since I wrote this post, and I think the most important message in it, is that I believe that we humans have a tendency to think in terms of hierarchical classes, and that that tendency leads to programmers overusing inheritance in their code. Sometimes, that’s a good choice, a lot of the times, it’s not. So I would probably add to your point about not _forcing_ inheritance by suggesting that perhaps we should make some extra effort to _avoid_ inheritance before deciding that it’s the right choice.

Petter Måhlén's Blog