Posts Tagged Simplicity

Qualities of Quality

[This is a slightly modified cross-post of something I wrote for the internal Spotify blog.]

I’m currently on parental leave, which is something that leaves very little time for any concentrated work effort because your first priority is to be on-call to solve the problems of a baby and you get interrupted all the time. But in between interruptions you can reflect on things and sometimes respond to email threads. I’ve been thinking about one topic in particular, namely quality, over the last week or two, and now my son is asleep, so I’m going to try to write up a blog post about it. In this post, I’m making the claim that

Developer-facing quality is a completely different thing from end-user facing quality, and is usually more important.

In one of the email threads I’ve been responding to at work, I said that “near-perfect quality code is … a meta-feature” meaning it affects and improves all other features, and that “it’s really a requirement to achieve sustainable speed”. I now think I can express that more precisely by considering different kinds of quality. The kind of quality I believe most people think about is what the end user experiences: product quality. Bugs, UI inconsistencies, etc. Product quality is a (non-functional) feature like any other, and can rightly be prioritised relative to other product features, such as performance, an improved design, a better recommendation algorithm and so on. The kind of quality that is a meta-feature is the quality a developer experiences, what I would term implementation quality. Things like readability and understandability of the code, ease of re-use, and bug-free-ness. Implementation quality doesn’t affect the end user experience, but it impacts the productivity of the teams working on improving the end user experience. These two kinds of quality overlap but are not the same:

Kinds of Quality

While a product can be compelling even if it has poor product quality and therefore a tradeoff between product quality and other product features is meaningful, it’s much harder to motivate not paying attention to implementation quality. Poor implementation quality kills your ability to add features and evolve your product and therefore makes poor product quality a much more serious problem. Poor implementation quality becomes an obstacle to changing your product, and changing the product quickly is key to getting it right.

A very closely related thought is Martin Fowler’s Design Stamina Hypothesis, which is an article you really should read and understand. So I’m not going to summarise it here, just go ahead and read it. Seriously, read it. Done? OK, now spend 4.43 minutes watching Ward Cunningham explaining (technical) debt. Even if you’ve watched that presentation before, it’ll be ~5 minutes very well spent.

Cunningham says that it can be a good idea to take on debt if you’re saying “I don’t understand this domain well enough to know that I’m building the right features or that I’m using the right abstractions for the features I’m building”. The first kind of debt is what Eric Ries is addressing in the lean startup movement, and what we’re addressing at Spotify with the Think It, Build It, Ship It, Tweak It mantra. The second kind is something you should take on and need to continually pay back as you get the required understanding of your domain. It’s not OK to take on debt by not following good engineering practices and, for instance, not cleaning up your code once you’ve made it work. It should be easy to understand the abstractions you’ve chosen, even if they’re not the right ones. Paying back debt should be mostly about fixing your design as you better understand what it should have been, not about fixing bugs or spaghetti code.

The two most important points Fowler makes, I think, are the narrowness of the time interval during which it is meaningful to trade off design/implementation quality for speed, and that once you’re past the point where the two curves intersect, continuing to disregard implementation quality just slows you down further. You can only profitably trade off implementation quality for speed in very short-lived projects. Probably, if you’re expecting a system to have a lifespan of more than a couple of weeks, it’s a good idea to pay attention to implementation quality right from the start. I think there may be exceptions, when you desperately need to get some feature out to survive. But most places are not struggling to survive from day to day, they’re doing things like figuring out how to build the best music streaming service in history. That’s hard, so we need to make sure that we can adapt our product quickly and easily as we learn how to do it.

In the figure above, I included simplicity as an aspect of both product and implementation quality. This is primarily due to a couple of thoughts: first, an article by Andres Kutt, where especially the section on what he calls functional architecture is relevant. The point he is making is that due to the huge number of features in the Skype web store, it ended up in a state where it was almost impossible to make changes to it. Feature count makes users confused (hence it’s a product quality issue) and it adds code complexity, making the code base less amenable to change. It’s a mistake to think that a feature is free just because you’re not doing active development on it.

A couple of notes on bugs. I include that in both product and implementation quality. The product perspective is pretty obvious – bugs detract from the user experience – but the implementation aspects may be a little less apparent. The first aspect is of course that it’s easier to build something on top of a solid component than a library. So if services and libraries are bug free, it’s easier to make sure that the end user experience is good. But there’s a second aspect as well. Bugs reduce productivity in many ways – for a longer discussion, see this post. The short of it is that unfixed bugs in your code lead to additional meetings, bug management overhead, duplicate reports of the same bug and context switching. So a lot of the time, the best thing you can do for your own productivity is to just fix pretty much everything that’s ever reported.

One common misconception about quality and its impact on delivery speed is that things like pluggability/extensibility/configurability of some technical solution are quality. Those are things often labelled as over-engineering. To me, over-engineering is engineers adding waste by inventing features that aren’t actually really needed. The notorious 2002 Standish Group report on feature use concluded that 64% of product features are rarely or never used. Considering that features interact in ways that make code less malleable, the best thing you can do for your own and your team’s productivity is to question any feature that goes into the product you’re working on. Especially if you came up with it yourself. At Spotify, product owners get their scope creep tendencies kept in check by Think It, etc., but nobody really checks that we engineers limit the scope of the code we write. Over-engineering is not about creating something with too good implementation quality, it’s creating something with too many features. In nearly two decades of doing professional software development, I don’t think I ever felt a team I worked in overspent seriously on implementation quality, but I’ve definitely many times felt that we’ve wasted hugely on some feature or other.

I’ve also got some opinions about the use of TDD to drive implementation quality, but my son is going to wake up any minute, so that will have to remain a topic for a future blog post. :) For now, a summary of this post in two bullet points – to be able to move fast with product development, you need to:

  • Be ruthless about minimising the feature count, and
  • Always pay very close attention to implementation quality – but feel free to trade off product quality if needed.
Advertisements

, ,

3 Comments

Keeping Classes Simple using the SRP

The Single Responsibility Principle – that a class should have one and only one reason to change – is, I think, one of the most important principles to follow in order to make high quality and easily maintainable software. It’s also one of the principles that I see abused the most often (by myself as well as others). This post is cross-post of something I wrote for the internal Spotify blog, and talks about a technique to make it easier to adhere to the SRP and thus create simple classes, in the Rich Hickey sense: ‘easy’ is a subjective term describing the effort it takes some particular person to do something, ‘simple’ is an objective term describing the resulting artefact. Simple software is valuable (and complex software is costly and causes frustration), so it is very useful to have good tools that make it easy to create simple things.

I’ll describe a concrete technique that I think of as composing behaviours; I think it’s close to if not identical to the decorator pattern. I’ll be using some code I recently wrote in a library wrapping DNS lookups at Spotify. I needed to create something that would let us do SRV lookups, and if the results were empty or there was another kind of failure when doing the lookup, it should retain the previous data. This is to reduce the risk of service outages caused only by DNS failures. I also wanted to ensure that we had metrics on the time we spent on DNS lookups, the rate of failures, and how common empty responses were.

The most common reaction when faced with a problem description like that is to create a class, (DnsSrvResolver, let’s say) that implements those features. In a sense, this class does a single thing – DNS SRV lookups that are metered and whose results can be kept in case of problems. But I would argue that that ‘thing’ is too big, and it’s better to create multiple classes doing smaller things. Here’s how.

First, an interface (the code in the post is Java, but of course the technique can be applied to code in any programming language):

public interface DnsSrvResolver {
  List resolve(String fqdn);
}

Then, an implementation class that does the actual lookup of SRV records using a standard DNS library:

class XBillDnsSrvResolver implements DnsSrvResolver {
  private final LookupFactory lookupFactory;

  XBillDnsSrvResolver(LookupFactory lookupFactory) {
    this.lookupFactory = lookupFactory;
  }

  @Override
  public List resolve(final String fqdn) {
    Lookup lookup = lookupFactory.forName(fqdn);
    Record[] queryResult = lookup.run();

    if (lookup.getResult() != Lookup.SUCCESSFUL) {
      throw new DnsException(
        String.format("Lookup of '%s' failed with code: %d - %s ",
           fqdn, lookup.getResult(), lookup.getErrorString()));
    }

    return toHostAndPorts(queryResult);
  }

  private List toHostAndPorts(Record[] queryResult) {
    ImmutableList.Builder builder =
      ImmutableList.builder();

    if (queryResult != null) {
      for (Record record: queryResult) {
        if (record instanceof SRVRecord) {
          SRVRecord srvRecord = (SRVRecord) record;
          builder.add(HostAndPort.fromParts(
                srvRecord.getTarget().toString(),
                srvRecord.getPort())
          );
        }
      }
    }

    return builder.build();
  }
}

So far, nothing out of the ordinary. Here comes the trick: now, add the metrics-tracking and result-storing behaviours in new classes that delegate the actual DNS lookup to some other, unknown class. Metrics can be done like this:

class MeteredDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Timer timer;
  private final Counter failureCounter;
  private final Counter emptyCounter;

  MeteredDnsSrvResolver(DnsSrvResolver delegate,
                        Timer timer,
                        Counter failureCounter,
                        Counter emptyCounter) {
    this.delegate = delegate;
    this.timer = timer;
    this.failureCounter = failureCounter;
    this.emptyCounter = emptyCounter;
  }

  @Override
  public List<HostAndPort> resolve(String fqdn) {
    final TimerContext context = timer.time();
    boolean success = false;

    try {
      List<HostAndPort> result = delegate.resolve(fqdn);
      if (result.isEmpty()) {
        emptyCounter.inc();
      }
      success = true;
      return result;
    }
    finally {
      context.stop();
      if (!success) {
        failureCounter.inc();
      }
    }
  }
}

And retaining old data like this:

class RetainingDnsSrvResolver implements DnsSrvResolver {
  private final DnsSrvResolver delegate;
  private final Map<String, List<HostAndPort>> cache;

  RetainingDnsSrvResolver(DnsSrvResolver delegate) {
    this.delegate = delegate;
    cache = new ConcurrentHashMap<String, List<HostAndPort>>();
  }

  @Override
  public List<HostAndPort> resolve(final String fqdn) {
    try {
      List<HostAndPort> nodes = delegate.resolve(fqdn);

      if (nodes.isEmpty()) {
        nodes = firstNonNull(cache.get(fqdn), nodes);
      }
      else {
        cache.put(fqdn, nodes);
      }

      return nodes;
    }
    catch (Exception e) {
      if (cache.containsKey(fqdn)) {
        return cache.get(fqdn);
      }

      throw Throwables.propagate(e);
    }
  }
}

Note how small and simple the classes are. That makes them very easy to test – only the top level one requires complex setup, due to the nature of the dnsjava library. The others are easily tested like so:

  @Before
  public void setUp() throws Exception {
    delegate = mock(DnsSrvResolver.class);

    resolver = new RetainingDnsSrvResolver(delegate);

    nodes1 = nodes("noden1", "noden2");
    nodes2 = nodes("noden3", "noden5", "somethingelse");
  }

  @Test
  public void shouldReturnResultsFromDelegate() throws Exception {
    when(delegate.resolve(FQDN)).thenReturn(nodes1);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }

  @Test
  public void shouldReturnFreshResultsFromDelegate() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenReturn(nodes2);

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes2));
  }

  @Test
  public void shouldRetainDataIfNewResultEmpty() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenReturn(nodes());

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }

  @Test
  public void shouldRetainDataOnFailure() throws Exception {
    when(delegate.resolve(FQDN))
      .thenReturn(nodes1)
      .thenThrow(new DnsException("expected"));

    resolver.resolve(FQDN);

    assertThat(resolver.resolve(FQDN), equalTo(nodes1));
  }
  // etc..

The fact that the functionality is implemented using several different classes is an implementation detail that shouldn’t leak through to users of the library. (You may have noticed that the implementation classes are package-private.) So to make it easy for clients, I like to provide a simple fluent API, something like this perhaps:

    DnsSrvResolver resolver =
        DnsSrvResolvers.newBuilder()
          .cachingLookups(true)
          .metered(true)
          .retainingDataOnFailures(true)
          .build();

In summary, the advantages of splitting up behaviours into different classes are:

  1. Each class becomes really simple, which makes it easy to test them exhaustively. Some of the code above does non-trivial things, but testing it is still easy.
  2. Since each behaviour lives in a single place only, it’s trivial to, for instance, change the way metrics are recorded, or to replace the underlying DNS library without affecting the rest of the chain.
  3. Each class becomes more understandable because it is so small. Since behaviours are loosely coupled, the amount of stuff you need to keep ‘in working memory’ to work with a class is minimised.
  4. Varying behaviour in different situations becomes easier as it becomes a matter of composing different classes rather than having conditional logic in a single class. Basically, only the builder implementation in the API example above needs to worry about the value of ‘metered’. The DnsSrvResolver implementation doesn’t need to do a conditional check on ‘if (metered)’.

Doing this is not completely free, though:

  1. With more classes, you get more code to search through if you want to find out how, for instance, data is retained. It may be hard to understand the more dynamic run-time object graph that is behind the DnsSrvResolver you’re using. If the whole picture is there statically in code, it’s easier to find it.
  2. Instead of having complexity within a class, you have (more) complex interactions between instances of different classes. The complexity doesn’t go away entirely, it is moved.

Things that should get you thinking about whether you should split out behaviours into different classes include having too many (> 3-4 or so) collaborators injected into the constructor, or using the word ‘and’ when describing the ‘single’ responsibility of a class. Describing a class as “doing SRV lookups and retaining data and collecting statistics” is a giveaway that you don’t actually think of it as doing one thing.

In my experience, the benefits of using this technique very often outweigh the costs. It is easier to make stable software from a number of small, simple, composable building blocks than to do it using larger, more complex building blocks. Dealing with complexity in how things are put together is easier than trying to compose complex things. Writing code like this doesn’t take longer than writing it in a single class – the only added difficulty is in learning to see when there might be a need to split things out in order to make them simpler.

,

1 Comment