Testing and Code Correctness

Lately, I’ve found myself disagreeing with such giants as Trygve Reenskaug and Tony Hoare, and thinking that I have understood something about software that maybe they have not. A nobody disagreeing with two famous professors! Can I have a point? Well, read on and make up your own mind about it.

Since discovering about Trygve’s brainchild DCI, I’ve been following the discussions on the Object Composition discussion group. I’m not participating actively in the discussions; the main reason for me following the group is that Trygve is active there, and he has a lot of profound insights into software development which makes reading his posts a joy. But I’ve found myself disagreeing with him on one or possibly two closely related points. In at least one of his talks, he makes the point that testing does not help you get quality into a product. All tests do is prove that a particular execution path with particular parameter values works, they say nothing about what will happen if some parameter values change, and no matter how much you test some code, you can’t say that it is bug-free. Instead, the way to get quality into code is through readability. He often quotes Tony Hoare, who said:

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other is to make it so complicated that there are no obvious deficiencies.”

This is a lovely and memorable quote, but as all soundbites, it is a simplification. In fact, I think it is simplified to the point of not being meaningful, and while I think that Trygve’s point about testing is correct, I also think it is not very important and a slightly dangerous point to make as it might discourage you from testing. Tests, especially automated ones, are critical when building quality software, even though they don’t put quality into the product. Let me start on why I don’t think Tony Hoare’s quote is great with two code examples.

Some code that works

public class Multiplier {
   private static final double POINT_25 = 0.25;

   public double multiplyByPoint25(double amount) {
      return amount * POINT_25;
   }
}

This code is simple, to the point where it obviously contains no deficiencies. It’s not particularly interesting, but a glance shows that it does what you’d expect it to. One point to Tony!

Some code that is broken

public class VatCalculator {
   private static final double VAT_RATE = 0.25;

   public double calculateVat(double amount) {
      return amount * VAT_RATE;
   }
}

This code is broken in at least two ways:

It does monetary calculations using floating-point arithmetic. This means that calculations aren’t exact, and exactness is always a requirement when dealing with money (see for instance Effective Java, Item 48 for more on this).
It is probably broken in more subtle ways as well. In Sweden, the default VAT rate is indeed 25% on top of the price before VAT. But most of the time, people in Sweden think about this backwards – the only price they see is the one with VAT, so some people prefer to multiply the total price by .2 to get the part of the total price that is VAT. We don’t know if the amount is the amount before VAT or the total amount including VAT. Also, the VAT rate depends on the item type, so if this method were called using the price of a pencil, it would probably give the right answer, but it wouldn’t for a book. And if the amount is the total of an order for a pencil and a book, it’s wrong in yet another way. The list of things that could be broken in terms of how to do VAT calculations goes on.

And the funny thing is, from a machine-instructions perspective, the two classes are of course identical! Why is the first one right, and the second one wrong, just because we changed some names? When the names changed, our perception of the programmer’s intent changed with them. Suddenly, the identical machine instructions have a more specific purpose, and we see more of the actual or probable business rules that should be applied. This makes the second version obviously broken because it uses floating-point arithmetic for monetary calculations. We understand enough about the programmer’s intent to find a mistake. But while the floating-point error is obvious, there is nothing in the code that gives us clear evidence as to whether the VAT calculation is correct in terms of the rate that is being applied or not.

Clearly, readability is not sufficient to guarantee correctness. Correct code meets its business requirements – the intent of the person who uses it for something. Readability can at best tell us if it meets the intent of the programmer that wrote it. This doesn’t mean readability is unimportant; on the contrary, readable code is a holy grail to be strived for at almost all cost. But it does mean that Tony Hoare’s goal is unachievable. The code by itself cannot “obviously have no deficiencies”, because the code only tells us what it does, not what it should do.

Testing to the Rescue

So how can we ensure that our code is correct? Well, we need well-defined requirements and the ability to match those requirements with what the code does. There are people whose full-time job is to define requirements; all they do is to formulate business users’ descriptions of what they do in terms that should be unambiguous, free of duplication, conflicts, and so on, so that programmers and testers can get a clear picture of what the code should actually be doing. There are whole toolsets for requirements analysis and management that help these people produce consistent and unambiguous definitions of requirements. Once that’s done the problem is of course that it’s really hard to know if your code actually meet all the requirements that are defined. No problem, there’s further tools and processes that support mapping the requirements to test cases, executions of these test cases and the defects found and fixed.

Like many others (this is at the heart of agile), I think all that is largely a waste of time. I’m not saying it doesn’t help, it’s just very inefficient. It’s practically impossible to formulate anything using natural language in a way that is unambiguous, consistent and understandable. And then mapping requirements defined using natural language to tests and test executions is again hard to the point of being impossible. The only languages we have that allow us to formulate statements unambiguously and with precision are formal systems such as programming languages. But wait, did I just say we can write something that has no ambiguity in code? Could we formulate our requirements using code? Yes, of course. Done right, that’s exactly what automated tests are.

If our automated tests reflect the intent of the users of the code, we can get an extremely detailed and precise specification of the requirements that has practically zero cost of verification. Since the specification is written in a programming language, it is written using one of the best tools we know how to design in terms of optimal unambiguity and readability. As we all know, programming languages have shortcomings there, but they are way superior to natural language.

The argument that business users won’t be able to understand tests/requirements written in code and that they therefore must be in natural language is easy to refute on two grounds: first, business users don’t understand a Word document or database with hundreds or thousands of requirements either (and neither do programmers), and second, let the code speak for itself. Business users can and do understand what your product does, and if it does the wrong thing, they can tell you. Fix the code (tests first), and ship a new version. Again, a core principle in agile.

I often come across a sentiment among developers that tests are restrictive (“I’ve fixed the code, why do I have to fix the test too”), or that you “haven’t got the time to write tests”. That’s a misunderstanding – done right, tests are liberating and save time. Liberating, because the fact that your tests specify how the code is intended to behave means that you don’t need to worry as much about breaking existing functionality. That enables you to have multiple teams co-owning the same code, which improves your agility. Maybe even more importantly, having a robust harness of tests around some code enables you to refactor it and thereby keep the code readable even in the face of evolution of the feature set it must support.

So that should hopefully explain why I think the point that Trygve made about tests not being able to get quality into a product is valid but not really important. The quality comes from the clarity of the system architecture, system design and code implementation, not from the tests. But automated tests enable you to keep quality in a product as it evolves by preventing regression errors and making it possible for you to keep the all-important clarity and readability up to date even as the code evolves.

Automated Testing, Productivity

This entry was posted on December 10, 2010, 08:33 and is filed under Software Development. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

Petter Måhlén's Blog