Archive for December, 2010

Testing and Code Correctness

Lately, I’ve found myself disagreeing with such giants as Trygve Reenskaug and Tony Hoare, and thinking that I have understood something about software that maybe they have not. A nobody disagreeing with two famous professors! Can I have a point? Well, read on and make up your own mind about it.

Since discovering about Trygve’s brainchild DCI, I’ve been following the discussions on the Object Composition discussion group. I’m not participating actively in the discussions; the main reason for me following the group is that Trygve is active there, and he has a lot of profound insights into software development which makes reading his posts a joy. But I’ve found myself disagreeing with him on one or possibly two closely related points. In at least one of his talks, he makes the point that testing does not help you get quality into a product. All tests do is prove that a particular execution path with particular parameter values works, they say nothing about what will happen if some parameter values change, and no matter how much you test some code, you can’t say that it is bug-free. Instead, the way to get quality into code is through readability. He often quotes Tony Hoare, who said:

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other is to make it so complicated that there are no obvious deficiencies.”

This is a lovely and memorable quote, but as all soundbites, it is a simplification. In fact, I think it is simplified to the point of not being meaningful, and while I think that Trygve’s point about testing is correct, I also think it is not very important and a slightly dangerous point to make as it might discourage you from testing. Tests, especially automated ones, are critical when building quality software, even though they don’t put quality into the product. Let me start on why I don’t think Tony Hoare’s quote is great with two code examples.

Some code that works

public class Multiplier {
   private static final double POINT_25 = 0.25;

   public double multiplyByPoint25(double amount) {
      return amount * POINT_25;

This code is simple, to the point where it obviously contains no deficiencies. It’s not particularly interesting, but a glance shows that it does what you’d expect it to. One point to Tony!

Some code that is broken

public class VatCalculator {
   private static final double VAT_RATE = 0.25;

   public double calculateVat(double amount) {
      return amount * VAT_RATE;

This code is broken in at least two ways:

  1. It does monetary calculations using floating-point arithmetic. This means that calculations aren’t exact, and exactness is always a requirement when dealing with money (see for instance Effective Java, Item 48 for more on this).
  2. It is probably broken in more subtle ways as well. In Sweden, the default VAT rate is indeed 25% on top of the price before VAT. But most of the time, people in Sweden think about this backwards – the only price they see is the one with VAT, so some people prefer to multiply the total price by .2 to get the part of the total price that is VAT. We don’t know if the amount is the amount before VAT or the total amount including VAT. Also, the VAT rate depends on the item type, so if this method were called using the price of a pencil, it would probably give the right answer, but it wouldn’t for a book. And if the amount is the total of an order for a pencil and a book, it’s wrong in yet another way. The list of things that could be broken in terms of how to do VAT calculations goes on.

And the funny thing is, from a machine-instructions perspective, the two classes are of course identical! Why is the first one right, and the second one wrong, just because we changed some names? When the names changed, our perception of the programmer’s intent changed with them. Suddenly, the identical machine instructions have a more specific purpose, and we see more of the actual or probable business rules that should be applied. This makes the second version obviously broken because it uses floating-point arithmetic for monetary calculations. We understand enough about the programmer’s intent to find a mistake. But while the floating-point error is obvious, there is nothing in the code that gives us clear evidence as to whether the VAT calculation is correct in terms of the rate that is being applied or not.

Clearly, readability is not sufficient to guarantee correctness. Correct code meets its business requirements – the intent of the person who uses it for something. Readability can at best tell us if it meets the intent of the programmer that wrote it. This doesn’t mean readability is unimportant; on the contrary, readable code is a holy grail to be strived for at almost all cost. But it does mean that Tony Hoare’s goal is unachievable. The code by itself cannot “obviously have no deficiencies”, because the code only tells us what it does, not what it should do.

Testing to the Rescue

So how can we ensure that our code is correct? Well, we need well-defined requirements and the ability to match those requirements with what the code does. There are people whose full-time job is to define requirements; all they do is to formulate business users’ descriptions of what they do in terms that should be unambiguous, free of duplication, conflicts, and so on, so that programmers and testers can get a clear picture of what the code should actually be doing. There are whole toolsets for requirements analysis and management that help these people produce consistent and unambiguous definitions of requirements. Once that’s done the problem is of course that it’s really hard to know if your code actually meet all the requirements that are defined. No problem, there’s further tools and processes that support mapping the requirements to test cases, executions of these test cases and the defects found and fixed.

Like many others (this is at the heart of agile), I think all that is largely a waste of time. I’m not saying it doesn’t help, it’s just very inefficient. It’s practically impossible to formulate anything using natural language in a way that is unambiguous, consistent and understandable. And then mapping requirements defined using natural language to tests and test executions is again hard to the point of being impossible. The only languages we have that allow us to formulate statements unambiguously and with precision are formal systems such as programming languages. But wait, did I just say we can write something that has no ambiguity in code? Could we formulate our requirements using code? Yes, of course. Done right, that’s exactly what automated tests are.

If our automated tests reflect the intent of the users of the code, we can get an extremely detailed and precise specification of the requirements that has practically zero cost of verification. Since the specification is written in a programming language, it is written using one of the best tools we know how to design in terms of optimal unambiguity and readability. As we all know, programming languages have shortcomings there, but they are way superior to natural language.

The argument that business users won’t be able to understand tests/requirements written in code and that they therefore must be in natural language is easy to refute on two grounds: first, business users don’t understand a Word document or database with hundreds or thousands of requirements either (and neither do programmers), and second, let the code speak for itself. Business users can and do understand what your product does, and if it does the wrong thing, they can tell you. Fix the code (tests first), and ship a new version. Again, a core principle in agile.

I often come across a sentiment among developers that tests are restrictive (“I’ve fixed the code, why do I have to fix the test too”), or that you “haven’t got the time to write tests”. That’s a misunderstanding – done right, tests are liberating and save time. Liberating, because the fact that your tests specify how the code is intended to behave means that you don’t need to worry as much about breaking existing functionality. That enables you to have multiple teams co-owning the same code, which improves your agility. Maybe even more importantly, having a robust harness of tests around some code enables you to refactor it and thereby keep the code readable even in the face of evolution of the feature set it must support.

So that should hopefully explain why I think the point that Trygve made about tests not being able to get quality into a product is valid but not really important. The quality comes from the clarity of the system architecture, system design and code implementation, not from the tests. But automated tests enable you to keep quality in a product as it evolves by preventing regression errors and making it possible for you to keep the all-important clarity and readability up to date even as the code evolves.



Leave a comment

The Power of Standardisation

A couple of recent events made me see the value of company-internal standardisation in a way that I hadn’t before. Obviously, reusing standardised solutions for software development is a good thing, as it is easier to understand them. But I’ve always rated continuous evolution of your technologies and choosing the best tool for the problem at hand higher. I’m beginning to think that was wrong, and that company-internal standards are or can be a very important consideration when choosing solutions.

Let’s get to the examples. First, we’ve had a set of automated functional regression tests that we can run against our sites for quite some time. But since the guy who developed those tests left, it has turned out that we can’t run them any more. The reason is that they were developed using Perl, a programming language that most of the team members are not very comfortable with, and that the way you would run them involved making some manual modifications to certain configuration files, then selecting the right ‘main’ file to execute. We’ve recently started replacing those tests with new ones written in Java and controlled by TestNG. This means it was trivial for us to run the tests through Maven. Some slight cleverness (quite possibly a topic for another blog post) allows us to run the tests for different combinations of sites (our team runs 8 different sites with similar but not identical functionality) and browsers using commands like this:

mvn test -Pall-sites -Pfirefox -Pchrome -Pstage

This meant it was trivial to get Hudson to run these tests for us against the staging environment and that both developers and QA can run the tests at any time.

The second example is also related to automated testing – we’ve started creating a new framework for managing our performance tests. We’ve come to the conclusion that our team in the EU has different needs to the team in the US that maintains our current framework, and in the interest of perfect agility, we should be able to improve our productivity by owning the tools we work with. We just deployed the first couple of versions of that tool, and our QA Lead immediately told me that he felt that even though the tool is still far inferior to the one it will eventually replace, he was really happy to have a plain Java service to deploy as opposed to the current Perl/CGI-based framework. Since Java services are our bread and butter at Shopzilla (I’ve lost count, but our systems probably include about 30-50 services, most of which are written in Java and use almost RESTful HTTP+XML APIs), we have great tools that support automated deployment and monitoring of these services.

The final example was a program for batch processing of files that we need for a new feature. Our initial solution was a plain Java executable that monitored a set of directories for files to process. However, it quickly became obvious that we didn’t know how to deal with that from a configuration management/system operations perspective. So even though the development and QA was done, and the program worked, we decided to refit it as one of our standard Java services, with a feature to upload files via POST requests instead of monitoring directories.

In all of these cases, there are a few things that come to mind:

  • We’ve invested a lot in creating tools and processes around managing Java services that are deployed as WARs into Tomcat. Doing that is dead easy for us.
  • We get a lot of common features for free when reusing this deployment form: standard solutions for setting server-specific configuration parameters, logging, load balancing, debugging, build numbers, etc.
  • Every single person working here is familiar with our standard Java/Tomcat services. Developers know where to find the source code and where the entry points are. QA knows where to find log files, how to deploy builds for testing and how to check the current configuration. CM knows how to configure environments and how to set up monitoring tools, and so on.

I think there is a tendency among developers – certainly with me – to think only about their own part when choosing the tools and technologies for developing something new. So if I would be an expert in, say, Ruby on Rails, it would probably be very easy for me to create some kind of database admin tool using that. But everybody else would struggle with figuring out how to deal with it – where can I find the logs, how do I build it, how is it deployed and how do I set up monitoring?

There is definitely a tradeoff to be made between being productive today with existing tools and technologies and being productive tomorrow through migrating to newer and better ones. I think I’ve not had enough understanding of how much smoother the path can be if you stay with the standard solutions compared to introducing new technologies. My preference has always been to do gradual, almost continuous migration to newer tools and technologies, to the point of having that as an explicit strategy at Jadestone a few years ago. I am now beginning to think it’s quite possible that it is better to do technology migrations as larger, more discrete events where a whole ecosystem is changed or created at the same time. In the three cases above, we’re staying on old, familiar territory. That’s the path of least resistance and most bang for the buck.

, ,

Leave a comment