Archive for March, 2010

Code Sharing: Use Git

(This is item 2 in the code sharing cookbook)

Joel Spolsky used what seems to be his last blog post to talk about Git and Mercurial. I like his description of their main benefit as being that they track changes rather than revisions, and like him, I don’t particularly like the classification of them as distributed version control systems. As I’ve mentioned before, the ‘distributed’  bit isn’t what makes them great. In this post, I’ll try to explain why I think that Git is a great VCS, especially for sharing code between multiple teams – I’ve never used Mercurial, so I can’t have any opinions on it. I will use SVN as the counter-example of an older version control system, but I think that in most of the places where I mention SVN, it could be replaced by any other ‘centralised’ VCS.

The by far biggest reason to use Git when sharing code is its support for branching and merging. The main issue at work here is the conflict between two needs: teams need to have complete control of their code and environments in order to be effective in developing their features, and the overall need to detect and resolve conflicting changes as quickly as possible. I’ll probably have to explain a little more clearly what I mean by that.

Assume that Team Red and Team Blue are both working on the same shared library. If they push their changes to the exact same central location, they are likely to interfere with each other. Builds will break, bugs will be introduced in parts of the code supposedly not touched, larger changes may be impossible to make and there will be schedule conflicts – what if Team Blue commits a large and broken change the day before Team Red is going to release? So you clearly want to isolate teams from each other.

On the other hand, the longer the two teams’ changes are isolated, the harder it is to find and resolve conflicting changes. Both volume of change and calendar time are important here. If the volume of changes made in isolation is large and the code doesn’t work after a merge, the volume of code to search in order to figure out the problem is large. This of course makes it a lot harder to figure out where the problem is and how to solve it. On top of that, if a large volume of code has been changed since the last merge, the risk that a lot of code has been built on top of a faulty foundation is higher, which means that you may have wasted a lot of effort on something you’ll need to rewrite.

To explain how long calendar time periods between merges are a problem, imagine that it takes maybe a couple of months before a conflict between changes is detected. At this time, the persons who were making the conflicting changes may no longer remember exactly how the features were supposed to work, so resolving the conflicts will be more complicated. In some cases, they may be working in a totally different team or even have left the company. If the code is complicated, the time when you want to detect and fix the problem is right when you’re in the middle of making the change, not even a week or two afterwards. Branches represent risk and untestable potential errors.

So there is a spectrum between zero isolation and total isolation, and it is clear that the extremes are not where you want to be. That’s very normal and means you have a curve looking something like this:

You have a cost due to team interference that is high with no isolation and is reduced by introducing isolation, and you have a corresponding cost due to the isolation itself that goes up as you isolate teams more. Obviously the exact shape of the curves is different in different situations, but in general you want to be at some point between the extremes, close to the optimum, where teams are isolated enough for comfort, yet merges happen soon enough to not allow the conflict troubles to grow too large.

So how does all that relate to Git? Well, Git enables you to fine-tune your processes on the X axis in this diagram by making merges so cheap that you can do them as often as you like, and through its various features that make it easier to deal with multiple branches (cherry-picking, the ability to identify whether or not a particular commit has gone into a branch, etc.). With SVN, for instance, the costs incurred by frequent merges are prohibitive, partly because making a single merge is harder than with Git, but probably even more because SVN can only tell that there is a difference between two branches, not where the difference comes from. This means that you cannot easily do intermediate merges, where you update a story branch with changes made on the more stable master branch in order to reduce the time and volume of change between merges.

At every SVN merge, you have to go through all the differences between the branches, whereas Git’s commit history for each branch allows you to remember choices you made about certain changes, greatly simplifying each merge. So during the second merge, at commit number 5 in the master branch, you’ll only need to figure out how to deal with (non-conflicting) changes in commits 4 and 5, and during the final merge, you only need to worry about commits 6 and 7. In all, this means that with SVN, you’re forced closer to the ‘total isolation’ extreme than you would probably want to be.

Working with Git has actually totally changed the way I think about branches – I used to say something along the lines of ‘only branch in extreme situations’. Now I think having branches is a good, normal state of being. But the old fears about branching are not entirely invalidated by Git. You still need to be very disciplined about how you use branches, and for me, the main reason is that you want to be able to quickly detect conflicts between them. So I think that branches should be short-lived, and if that isn’t feasible, that relatively frequent intermediate merges should be done. At Shopzilla, we’ve evolved a de facto branching policy over a year of using Git, and it seems to work quite well:

  • Shared code with a low rate of change: a single master branch. Changes to these libraries and services are rare enough that two teams almost never make them at the same time. When they do, the second team that needs to make changes to a new release of the library creates a story branch and the two teams coordinate about how to handle merging and releasing.
  • Shared code with a high rate of change: semi-permanent team-specific branches and one team has the task of coordinating releases. The teams that work on their different stories/features merge their code with the latest ‘master’ version and tell the release team which commits to pick up. The release team does the merge and update of the release branch and both teams do regression QA on the final code before release. This happens every week for our biggest site.
  • Team-specific code: the practice in each team varies but I believe most teams follow similar processes. In my team, we have two permanent branches that interleave frequently: release and master, and more short-lived branches that we create on an ad-hoc basis. We do almost all of our work on the master branch. When we’re starting to prepare a release (typically every 2-3 weeks or so), we split off the release branch and do the final work on the stories to be released there. Work on stories that didn’t make it into the release goes onto the master branch as usual. It is common that we have stories that we put on story-specific branches, when we don’t believe that they will make it into the next planned release and thus shouldn’t be on master.

The diagram above shows a pretty typical state of branches for our team. Starting from the left, the work has been done on the master branch. We then split off the release branch and finalise a release there. The build that goes live will be the last one before merging release back into master. In the mean time, we started some new work on the master branch, plus two stories that we know or believe we won’t be able to finish before the next release, so they live in separate branches. For Story A, we wanted to update it with changes made on the release and master branch, so we merged them into Story A shortly before it was finished. At the time the snapshot is taken, we’ve started preparing the next release and the Story A branch has been deleted as it has been merged back into master and is no longer in use. This means that we only have three branches pointing to commits as indicated by the blueish markers.

This blog post is now far longer than I had anticipated, so I’m going to have to cut the next two advantages of Git shorter than I had planned. Maybe I’ll get back to them later. For now, suffice it to say that Git allows you to do great magic in order to fix mistakes that you make and even extracting and combining code from different repositories with full history. I remember watching Linus Torvalds’ Tech Talk about Git and that he said that the performance of Git was such that it led to a quantum change in how he worked. For me working with Git has also led to a radical shift in how I work and how I look at code management, but it’s not actually the performance that is the main thing, it is the whole conceptual model with tracking commits that makes branching and merging so easy that has led to the shift for me. That Git is also a thousand times (that may not be strictly true…) faster than SVN is of course not a bad thing either.

,

5 Comments

Share Code Selectively

(This is item 1 in the code sharing cookbook)

Since shared code leads to free features, one might think that more sharing is always better. That is actually not true. Sharing code or technology between products has some very obvious benefits and some much less obvious costs. The sneakiness of the costs leads to underestimating them, which in turn can lead to broken attempts at sharing things. I’ll try to give my picture of what characterises things that are suitable for sharing and how to think about what not to share in this post. Note that the perspective I have is based on an organisation whose products are some kind of service (I’ve mostly been developing consumer-oriented web services for the last few years) as opposed to shrink-wrapped products, so probably a lot of what I say isn’t applicable everywhere.

These are some of the main reasons why you want to share code:

  1. You get features for free – this is almost always the original reason why you end up having some shared code between different products. Product A is out there, and somebody realises that there is an opportunity to create product B which has some similarities with A. The fastest and cheapest way to get B out and try it is to build it on A, so let’s do that.
  2. You get bug fixes for free – of course, if product A and B share code and a bug is fixed for product A, when B starts using the fixed version of the shared code, it is fixed for B as well.
  3. Guaranteed consistent behaviour between products in crucial functional areas. This is typically important for backoffice-type functions, where, for instance, you want to make sure that all your products feed data into the data warehouse in a consistent way so the analysts can actually figure out how the products are doing using the same tools.
  4. Using proven solutions and minimising risk. Freshly baked code is more likely to have bugs in it than stuff that has been around for a while.
  5. Similarity of technology can typically reduce operational costs. The same skill sets and tools can be used to run most or all of your products and you can share expensive environments for performance testing, etc. This also has the effect of making it easier for staff to move between products as there is less new stuff to learn in order to get productive with your second product.

All of those reasons are very powerful and typically valid. But they need to be contrasted against some of the costs that you incur from sharing code:

  1. More communication overhead and slower decision making. To change a piece of code, one needs to talk to many people to ensure that it doesn’t break their planned or existing functionality. Decisions about architecture may require days instead of minutes due to the need to coordinate multiple teams.
  2. More complicated code. Code that needs to support a single product with a single way of doing things can be simpler than code that has to support multiple products with slight variations on how they do things. This complexity tends to increase over time. Also, every change has to be made with backwards compatibility in mind, which adds additional difficulties to working with the code.
  3. More configuration management overhead. Managing different branches and dependencies between different shared libraries is time consuming, as are merges when they have to happen. Similarly, you need to be good at keeping track of which versions of shared libraries are used for a particular build.
  4. More complex projects, especially when certain pieces of shared technology can only be modified by certain people or teams. If there are two product teams (A and B) and a team that delivers shared functionality (let’s call them ‘core’), the core team’s backlog needs to be prioritised based on both the needs of A and B. Also, both team A and B are likely to end up blocked waiting for changes to be made by the core team – during times like that, people don’t typically become idle, they just work on other things than what is really important, leading to reduced productivity and a lack of the ‘we can do anything’ energy that characterises a project that runs really well.
  5. More mistakes – all the above things lead to a larger number of mistakes being made, which costs in terms of frustration and time taken to develop new features.

The problem with the costs is that they are insidious and sneak up on you as you share more and more stuff between more products, whereas the benefits are there from day 1 – especially on day 1, when you release your second product which is ‘almost the same’ as the first one and you want as many free features as you can get.

So sharing code can give you lots of important or even vital benefits, but done wrong, it can also make your organisation into a slow-moving behemoth stuck in quicksand due to the dependencies it creates between the products and the teams that should develop them. The diagram below shows how while shared libraries can be building blocks for constructing products, they also create ties between the products. These ties will need management to prevent them from binding your arms behind your back.

The way I think about it, products represent things that you make money from, so changing your products should lead to making more money. Shared code makes it possible for you to develop your products at a lower cost or with a lower risk, but reduces the freedom to innovate on the product level. So in the diagram, the blue verticals represent a money-making perspective and the red horizontals a cost-saving perspective. I guess there is something smart to be said about when in a product’s or organisation’s lifecycle one is more important than the other, but I don’t really know what – probably more mature organisations or products need more sharing and less freedom to develop?

Anyway, getting back to the core message of this post: I’m saying that even if code is identical between two products, that doesn’t necessarily mean that it should be shared. The reason is that it may well start out identical, but if it is likely to be something that one or both product teams will want to modify in the future to maximise their chances of making money, both products will be slowed down by having to worry about what their changes are doing to the other team. Good candidates for sharing tend to have:

  1. A low rate of change – so the functionality is mature and not something you need to tweak frequently in order to tune your business or add features to your product.
  2. A tight coupling to other parts of the company’s ecosystem – reporting/invoicing systems, etc. This usually means tight integration into business processes that are hard to change.
  3. A high degree of generality – the extreme examples of such general systems are of course things like java.util.Set or log4j. Within a company, you can often find things that are very generic in the context of the business.

Of course, those three factors are related. I have found that simply looking at the first one by checking the average number of commits over some period of time gives a really good indication. If there are many changes, don’t share the code. If there are few, you might want to share it. I think the reason why it works is partly that rate of change is a very good indicator of generality and partly because if you try to share something that changes a lot, you’ll incur the costs of sharing very frequently.

Sharing is great, but it’s definitely not something that should be maximised at all costs. Any sharing of functionality between two products creates dependencies between them in the form of feature interaction that adds to the cost of features and schedule interaction between projects/teams that get blocked by each other due to changes to something that is shared.

It is often useful to think of sharing at different levels: maybe it isn’t a great idea to create a shared library from some code that you believe will be modified frequently by two different projects. As an alternative, you can still gain free features by copying and pasting that code from one product to the other, and then letting the two versions lead their own lives. So share code, but be smart about it and don’t share everything!

2 Comments

Cookbook for Code Sharing

If you’re in an organisation that grows or whose business is changing, you’ll soon want to add another product to the one or ones you’ve already got. Frequently, the new product idea has a lot of similarity to existing ones (because you tend to both come up with ideas in the space where you work, and because you’ll tend to want to play to your existing strengths), so there is a strong desire to reuse technology. First, to get the new product out, at least as a prototype, and later, assuming it is successful, continuing to share code in order to not have to reinvent the wheel.

Code sharing makes a lot of sense, but it is in fact a lot harder than it seems on the face of it. In what is quite possibly a more ambitious project than I will have the tenacity to complete, I’m going to try to set out some ideas on how to do code sharing in the kind of organisation that I have recent experience of: around 30 developers working on around 5 different products. The first one will be a bit theoretical, but the rest should be quite concrete with hands-on tips about how to do things.

Here’s the list of topics I’ve got planned:

  1. Share Code Selectively.
  2. Use Git.
  3. Use Scrum.
  4. Use Maven
  5. Use JUnit.
  6. Use Hudson.
  7. Divide and Conquer.
  8. Manage Dependencies.
  9. Communicate.

Over the next few weeks or months, I’ll try to write something more detailed about each of them. I would be surprised if I don’t have to go back to this post and update it based on the fact that my thinking around this will probably change as I write the posts.

Leave a comment

if (readable) good();

Last summer, I read the excellent book Clean Code by Robert Martin and was so impressed that I made it mandatory reading for the coders in my team. One of the techniques that he describes is that rather than writing code like:

  if (a != null && a.getLimit() > current) {
    // ...
  }

you should replace the boolean conditions with a method call and make sure the method name is descriptive of what you think should be true to execute the body of the if statement:

  if (withinLimit(a, current)) {
    // ...
  }

  private boolean withinLimit(A a, int current) {
    return a != null && a.getLimit() > current;
  }

I think that is great advice, because I used to always say that unclear logical conditions is one of the things that really requires code comments to make them understandable. The scenario where you have an if-statement with 3-4 boolean statements AND-ed and OR-ed together is

  1. not uncommon, and
  2. frequently a source of bugs, or something that needs changing, and
  3. is something that stops you dead in your tracks when trying to puzzle out what the code in front of you does.

So it is important to make your conditions easily understandable. Rather than documenting what you want the condition to mean, simply creating a tiny method whose name is exactly what you want the condition to mean makes the flow of the code so much smoother. I also think (but I’m not sure) that it is easier to keep method names in synch with code updates than it is to keep comments up to date.

So that’s advice I have taken onboard when I write code. Yesterday, I found myself writing this:

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) {

    // ...

    if (shouldPerformTracking()) {
      executor.submit(new TrackingCallable(servletRequest, trackingService, ctrContext));
    }

    // ...
  }

  private boolean shouldPerformTracking() {
    // only do tracking if the controller has added a page node to the context
    return trackingDeterminator.performTracking() && hasPageNode(ctrContext);
  }

  private boolean hasPageNode(CtrContext ctrContext) {
    return !ctrContext.getEventNode().getChildren().isEmpty();
  }

As I was about to check that in, it suddenly struck me that even if the name of the method describing what I meant by the second logical condition was correct (hasPageNode()), it was weird that I had just written a comment explaining just that – what I meant by the logical condition, namely that the tracking should happen if tracking was enabled and the controller had added tracking information to the context. I figured out what the problem was and changed it to this:

  private boolean shouldPerformTracking() {
    return trackingDeterminator.performTracking() && controllerAddedPageInfo(ctrContext);
  }

  private boolean controllerAddedPageInfo(CtrContext ctrContext) {
    return !ctrContext.getEventNode().getChildren().isEmpty();
  }

That was a bit of an eye-opener somehow. The name of the condition-method should describe the intent, not the mechanics of the condition. So the first take on the method name (hasPageNode()) described what the condition checked in a more readable way, but it didn’t describe why that was a useful thing to check. If this type of method names explain the why of the condition, I strongly think they increase code readability and therefore represent a great technique.

As always, naming is hard and worth spending a little extra effort on if you want your code to be readable.

,

Leave a comment

Unit Tests Are Good For You

Some 5-6 years ago, I became a mother, biologically implausible as it seems. At the time, I was CTO at Jadestone and that’s when I started to think that programmers writing their own unit tests is a paradigm shift of the same magnitude as object-orientation was in the 90s. So I started trying to convert the developers (and others) to the unit testing mentality, with varying success – some became as convinced as myself, others were very reluctant to write any tests at all, and most seemed to end up in some in-between state: “I guess it’s a good thing to do, and I will if there’s time, but…” It seems like a lot of developers end up somewhere in that middle state, with writing unit tests being something a bit analogous to cleaning your room every week. You know it’s a good thing to do, and you do it because Mum (yep, that’s me) is telling you you have to, but you don’t really want to.

Even at Shopzilla, where the builds fail unless a certain percentage of the code is covered by unit tests (this is a great practice despite the shortfalls of code coverage as a measurement of test quality), there are issues with people’s motivation to write unit tests. Far too much of our code is being ‘exercised’ by the unit tests with no checking of expected results, and/or without sufficient branch coverage.

I remember creating slides at Jadestone with what I still think are really good reasons to write lots of unit tests: it improves the code quality by helping you find bugs, making your code testable forces you to design it in a modular way, having great regression tests gives you the courage to do the aggressive refactoring that is required for long-term productivity, and so on. But after a few years of working with it, I’ve come to the conclusion that the main reason really is – hold your ears, I can feel years of nagging coming out all at once:

IT’S MAKING *YOU* MORE PRODUCTIVE. DAMMIT!

People thinking “I will write unit tests if I have the time” or “when I’m done with the code” are just plain wrong. True, just writing the code needed for a feature or bug fix takes less time than also writing unit tests. But checking that the fix/feature works if it is a part of a slightly complex system? Starting or restarting one of the Shopzilla websites takes something like 3-7 minutes. Once you’re up, you can typically test almost anything by navigating to a single URL, but if you didn’t get the fix right, you have to re-fix, re-build and re-start. If the fix is in a shared library, you need to re-build and re-install the library, then re-build and re-start the site. This takes ages. Usually a lot more than the time it takes to verify the fix with a unit test.

And with multiplayer games, for instance, the situation is even harder. To verify a Shopzilla-style website bug, all you need is normally a single URL to reproduce the problem. With more complex clients, you may have to bring a couple of clients and a server into a given error state in order to reproduce/verify a bug. This can be a nightmare. Even with unit tests, you’ll obviously have to do this, but if you do TDD, you’ll have written a test that reproduces the observed error before trying to fix it. Seeing that the unit test used to be broken but isn’t any longer is great for your confidence and usually means you only need to do the work to reproduce the bug “in reality” once. I’m not sure it is necessary to write tests before you write all your code, but I think it is a good idea to try to do so. And you definitely don’t want to write all your unit tests after you’ve written all the feature code.

I don’t try to convert people into unit tests as much as I used to, but if you do use unit tests properly, not only will you write code that is of higher quality, more modular and easier to refactor to make it support future features and changes. You’ll be adding working features quicker, too. Really, kids, you should clean your rooms every week.

,

Leave a comment

Have I learned something about API Design?

Some years ago, somebody pointed me to Joshua Bloch’s excellent presentation about how to design APIs. If you haven’t watched it, go ahead and do that before reading on. To me, that presentation felt and still feels like it was full of those things where you can easily understand the theory but where the practice eludes you unless you – well, practice, I guess. Advice like “When in doubt, leave it out”, or “Keep APIs free of implementation details” is easy to understand as sentences, but harder to really understand in the sense of knowing how to apply it to a specific situation.

I’m right now working on making some code changes that were quite awkward, and awkward code is always an indication of a need to tweak the design a little. As a part of the awkwardness-triggered tweaking, I came up with an updated API that I was quite pleased with, and that made me think that maybe I’m starting to understand some of the points that were made in that presentation. I thought it would be interesting to revisit the presentation and see if I had assimilated some of his advice into actual work practice, so here’s my self-assessment of whether Måhlén understands Bloch.

First, the problem: the API I modified is used to generate the path part of URLs pointing to a specific category of pages on our web sites – Scorching, which means that a search for something to buy has been specific enough to land on a single product (not just any digital camera, but a Sony DSLR-A230L). The original API looks like this:

public interface ScorchingUrlHelper {
  @Deprecated
  String urlForSearchWithSort(Long categoryId, Product product, Integer sort);

  @Deprecated
  String urlForProductCompare(Long categoryId, Product product);

  @Deprecated
  String urlForProductCompare(String categoryAlias, Product product);

  @Deprecated
  String urlForProductCompare(Category category, Product product);

  @Deprecated
  String urlForProductCompare(Long categoryId, String title, Long id, String keyword);

  String urlForProductDetail(Long categoryId, Product product);
  String urlForProductDetail(String categoryAlias, Product product);
  String urlForProductDetail(Category category, Product product);
  String urlForProductDetail(Long categoryId, String title, Long id, String keyword);

  String urlForProductReview(Long categoryId, Product product);
  String urlForProductReview(String categoryAlias, Product product);
  String urlForProductReview(Category category, Product product);
  String urlForProductReview(Long categoryId, String title, Long id);

  String urlForProductReviewWrite(Long categoryId, String title, Long id);

  String urlForWINSReview(String categoryAlias, Product product);
  String urlForWINSReview(Category category, Product product);
  String urlForWINSReview(Long categoryId, String title, Long id, String keyword);
}

Just from looking at the code, it’s clear that the API has evolved over a period and has diverged a bit from its original use. There are some deprecated methods, and there are many ways of doing more or less the same things. For instance, you can get a URL part for a combination of product and category based on different kinds of information about the product or category.

The updated API – which feels really obvious in hindsight but actually took a fair amount of work – that I came up with is:

public interface ScorchingTargetHelper {
  Target targetForProductDetail(Product product, CtrPosition position);

  Target targetForProductReview(Product product, CtrPosition position);

  Target targetForProductReviewWrite(Product product, CtrPosition position);

  Target targetForWINSReview(Product product, CtrPosition position);
}

The reason I needed to change the code at all was to add information necessary for tracking click-through rates (CTR) to the generated URLs, so that’s why there is a new parameter in each method. Other than that, I’ve mostly removed things, which was precisely what I was hoping for.

Here are some of the rules from Josh’s presentation that I think I applied:

APIs should be easy to use and hard to misuse.

  • Since click tracking, once introduced, will be a core concern, it shouldn’t be possible to forget about adding it. Hence it was made a parameter in every method. People can probably still pass in garbage there, but at least everyone who wants to generate a scorching URL will have to take CTR into consideration.
  • Rather than @Deprecate-ing the deprecated methods, the new API is backwards-incompatible, making it far harder to misuse. @Deprecated doesn’t really mean anything, in my opinion. I like breaking things properly when I can, otherwise there’s a risk you don’t notice that something is broken.
  • All the information needed to generate one of these URLs is available if you have a properly initialised Product instance – it knows which Category it belongs to, and it knows about its title and id. So I removed the other parameters to make the API more concise and easier to use.

Names matter

  • The old class was called ScorchingUrlHelper, and each of its methods called urlForSomething(). But it didn’t create URLs, it created the path parts of URLs (no protocol, host or port parts – query parts are not needed/used for these URLs). The new version returns an (existing) internal representation of something for which a URL can be created called a Target, and the names of the class and methods reflect that.

When in doubt, leave it out

  • I removed a lot of ways to create Targets based on subsets of the data about products and categories. I’m not sure that that strictly means that I followed this piece of advice, it’s probably more a question of reducing code duplication and increasing API terseness. So instead of making the API implementation create Product instances based on three different kinds of inputs, that logic was extracted into a ProductBuilder, and something similar for Category objects. And I made use of the fact that a Product should already know which Category it belongs to.

Keep APIs free of implementation details

  • Another piece of advice that I don’t think I was following very successfully, but it wasn’t too bad. An intermediate version of the API took a CtrNode instead of a CtrPosition – the position in case is a tree hierarchy indicating where on the page the link appears, and a CtrNode object represents a node in the tree. But that is an implementation detail that shouldn’t really be exposed.

Provide programmatic access to all data available in string form

  • Rather than using a String as the return object, the Target was more suitable. This is really a fairly large part of the original awkwardness that triggered the refactoring.

Use the right data type for the job

  • I think I get points for this in two ways: using the Product object instead of various combinations of Longs and Strings, and for using the Target as a return type.

There’s a lot of Josh’s advice that I didn’t follow. Much of it relates to his recommendations on how to go about coming up with a good, solid version of the API. I was far less structured in how I went about it. A piece of advice that it could be argued that I went against is:

Don’t make the client do anything the library could do

  • I made the client responsible for instantiating the Product rather than allowing multiple similar calls taking different parameters. In this case, I think that avoiding duplication was the more important thing, and that maybe the library couldn’t/shouldn’t do that for the client. Or perhaps, “the library” shouldn’t be interpreted as just “the class in question”. I did add a new class to the library that helps clients instantiate their Products based on the same information as before.
  • I made the client go from Target to String to actually get at the path part of the URL. This was more of a special case – the old style ScorchingUrlHelper classes were actually instantiated per request, while I wanted something that could be potentially Singleton scoped. This meant that either I had to add request-specific information as a method parameter in the API (the current subdomain), or change to a Target as the return type and do the rest of the URL generation outside. It felt cleaner to leave that outside, leaving the ScorchingTargetHelper a more focused class with fewer responsibilities and collaborators.

So, in conclusion: do I think that I have actually understood the presentation on a deeper level than just the sentences? Well, when I went through the presentation to write this post, I was actually pleasantly surprised at the number of bullet points that I think I followed. I’m still not convinced, though. I think I still have a lot of things to learn, especially in terms of coming up with concise and flexible APIs that are right for a particular purpose. But at least, I think the new API is an improvement over the old one. And what’s more, writing this post by going back to that presentation and thinking about it in direct relationship to something I just did was a useful exercise. Maybe I learned a little more about API design by doing that!

,

Leave a comment