Saturday, March 22, 2008

Synthesis and test confidence

George and I spoke to a bunch of friends about Synthesis a few weeks ago at the inaugural Reading Geek Night. Around six of us met in a pub in Reading and got really confused looks from the punters looking on on us as we waxed enthusiastically about TDD concepts and how we thought that they could be improved. I found it very interesting to get feedback from a group of really smart non ThoughtWorkers . They don't all practice “full on agile” but they certainly do understand what it means to build and ship working code, on time, with tight business pressures. It has taken a couple of weeks to digest the feedback, hence the slow time to post about the event.

So, here is another attempt to explain why we may need synthesis on our projects in the light of Reading Geek Night #1 and previous discussions with the guys at ThoughtWorks UK previous to this....

If you practice TDD, then the chances are that you already have a large number of unit tests. You may have a bunch of other automated tests of different types as well (functional, integration, performance...), and if you do, that's great. Keep doing that.

At the other end of the spectrum, you will also have some form of acceptance testing structure in place. The format of this varies from project to project. It could be a set of manual scripts which your Testing/QA team/Nominated Guy run, or it could be a bunch of system tests using FIT or one of the BDD frameworks. Maybe you paid for a commercial product and have something like a bunch of Rational Robot scripts.

We currently have a large disconnect between our high level tests and our low level unit tests, especially when you consider how confident you would be that the system works as desired by running one or more if the different types of test in the system.

At the bottom of the scale there is a system which has no tests. I would not be at all confident that this system worked. At the top end of the scale, I have a system which is running live in the real production environment and is being used by the actual user base. I am very confident that this system is working. At some point in the past, most businesses came to the conclusion that IT cannot blindly put systems live to discover if they work, hence all the interest in testing techniques.

High level system tests give us much more confidence that a system may work in production than unit tests because of the simple fact that they are exercising the production code in a more realistic manner; the data is more real, and all the interactions between the components are 'real', or as close to real as we can get in a test environment.

Unit tests provide a much weaker level of confidence due a number of issues when you want to use them to prove that a system works.

Unit tests are incomplete
We cannot unit test all of our code. Note that good design practices can massively reduce the amount of 'untestable' code in the system and by practicing TDD and running code coverage tools it is simple to provide a view of how well we are doing. However, there are always parts of the system which you either cannot test or neglect to unit test for some reason.

Unit tests are disconnected from other unit tests
Unit tests test a small amount of code, by definition, and often in isolation. If you are testing interactions between components then these are simulated using mocks. How can I be confident that I got my mock interactions correct? Also, how can I be confident that I have tested or even completed coding up the real version of whatever it is I am mocking? Again, we can mitigate this by reducing the number of interaction based unit tests and by writing state based unit tests when we can. However sometimes it is much more natural to follow the interaction test approach, and this incurs a risk that we are not simulating our interactions accurately.

Please don't take away from this that unit tests are bad. Unit tests are the lifeblood of a healthy project and are great for proving that the individual cogs are properly machined. What they do not tell you is whether you have the right number of cogs, or if they all fit together properly. If I want to know that components A,B, and C really play well together, then I need to write another aggregate test which puts ABC together and validates the unit tests got the interactions correct in a more realistic environment. George and I would call these functional tests, but you may use another term on your project. You may not have these tests on your project either – and that could be more worrying still – this means there could be gaps in your coverage at the levels that developers work at.

So, if I have functional tests plus unit tests, I am more confident that my system is working before I commit to running the slow system tests. B y running functional tests I have confidence that I have wired up all my well behaved units of code in a meaningful manner and that they are playing well with each other as I predicted. The chances are, that the wider system is going to work. I still am not as confident that the system works as I will be when my acceptance test suite is run by [insert machine/human here], but I can probably sleep well enough.

Now, some functional tests are inescapable and add a view into an aspect of the system that a unit test just cannot provide. A great example if you are using Spring would be to prove that you can get your components from the container and that they are wired up correctly. However, if you are purely proving that your simulated interactions are valid in the unit tests, then repetition is creeping into the process - which is wasteful.

This is where Synthesis comes in. Synthesis monitors the simulated interactions which you create in your unit tests and verifies that there is a corresponding unit test that exercises and validates the real object in your system. If everything joins up, then your build passes. If there are disconnects, then the build fails. So, if I have a unit tests for A, B, and C, then it is fine for me to simulate A's interactions with B and C using mocks, providing synthesis can match all my expectations with real tests which make the very same calls to the real A. The result is a synthetic bigger test, where A, B, and C are virtually linked by their expectations.

Obviously, there is a sliding scale of 'matching'. At one end you have basic method name matching, and at the other end of the scale there is full data matching for every call. The closer you get to true data matching, the more confidence is gained, but the more restrictions are placed on developers. We are not sure how far we need to go in order for a team to get an optimal balance between speed and safety – but I'd be very interested to get opinions on this matter :-)

Currently, Synthesis validates the call signature completely, but does not check the content of data passing between the mocks and data used to test an object for real. We plan to add this soon. However, there is a benefit to be had by ensuring that all your interactions join up. This will catch method mismatches, dynamic calls which are not tested such as Active Record queries, and gaps in your test coverage which have simply been missed by those fallible human developers!