Sunday, July 19, 2009

What's in a name

It is really important to call out the names of the big box patterns in a system and the responsibilities in each box. Even if it is immediately obvious to you how your application should hang together it may not be obvious to all the members of your team. As soon as there is uncertainty people will make guesses and a whole heap of code lands in the wrong place. This is refactoring you didn't need to incur.

Once you have identified the aspects of the code that you have names for get a picture on the wall so you can point or stare at it when you are having one of those "where should we put xyz?" conversations in your team.

Also be vigilant for code escaping it's box - and name the anti patterns related to each nefarious deed. My favourite is the Fat Controller anti pattern - where view/service/domain logic wheedles it's way into the controller leading to a spate of Thomas the Tank engine related jokes.

Also try to be super consistent with the naming conventions used for your core application structure - it makes a huge difference.

Thursday, June 18, 2009

NSynthesis 0.1.1 has escaped!


I'm delighted to announce that NSynthesis 0.1.1 has finally escaped from my laptop where it has languished for many months. It can be downloaded from here.
NSyntheis is a .NET implementation of the Ruby project, Synthesis. The idea is to connect tests which mock expectations to a corresponding test of a real implementation of the mocked call. Why? To give you more confidence that all those itty bitty unit tests add up to a comprehensive test suite.
Using PostSharp to hook into the gap between your test assembly and your production code, NSynthesis monitors your unit tests as they run. For each expectation set in your tests, NSynthesis demands that you have executed every possible concrete implementation during the same test run. NSyntheis uses PostSharp to modify the test code only, so there are no changes required to your production code base.
Currently, NSynthesis requires that you are using the following tools
  • NUnit 2.5
  • Rhino Mocks
  • Nant
The project is definitely at the curiosity stage of development. It needs to be run in on a proper project - so if you fancy giving it a go let me know how you get on with it.
The main issue which needs to be fixed next is handling multiple test runs in a build. Currently, there is no memory of the various unit test stages which you may have in a more advanced build (e.g. standalone tests, wired tests, functional tests). This is an issue if the mocked calls and the real code calls are not contained in the same test run.
Many thanks to Alex Scordellis, especially for his help with the evil .NET reflection API.

Thursday, November 13, 2008

Bowling Cards

Recently, my team decided we had to tackle a number of problems with our aging application code base which were causing us pain. There were a number of technology and design problems which needed sorting out. The challenge for us was to continue to deliver new features while taking measures to not only stop the signs of rot, but also to make an effort to improve the code base.  If we could improve the code where we were most active we would be able to deliver features more rapidly. 

The first stage was a technical retrospective, facilitated by Liv, our project manager. The whole team got together and we brainstormed the problems with the code base which were slowing down active development and causing us pain.  With everything hurting us out in the open, the team voting for the issues they thought were causing us the most pain. The focus was delivery, we weren't looking to make the code more beautiful for aesthetic reasons; we were trying to reduce friction to help us deliver more code more rapidly.  With the key problems agreed, we moved on to discuss how we could fix them. So far so good. 

Some of our problems could be fixed the traditional agile way write a technical card, estimate the cost and agree who would do the work and when.

Others were ongoing "try not to repeat this anti pattern" or "remember to do X" type solutions, so there was no action as such, other than to remember to not do "X" or to do "Y".

We also had a third class of problem. Long running repair or rework tasks. For example, the team has switched mocking libraries from NMock to Rhino Mocks.  While we find we are more productive when we mock using Rhino mocks, we still have a legacy of old unit tests which periodically bite us when we need to refactor.  The idea of scheduling a task to rewrite all our unit tests using Rhino Mocks is just plain wrong for a number of reasons.  On humanitarian grounds, I couldn't bring myself to ask a member of my team to spend a week rewriting all our tests - we'd need to put her on suicide watch.  Also, from a business value perspective, we only need to rewrite those tests which cover the active areas of the code.  The application has been around for a while, and we are only adding features in some areas of the code. If we are not going to be slowed down by NMock riddled tests in a remote area of the code, why waste client money fixing them?  Enter the bowling card.

in-progress-card

The anatomy of a bowling card is very simple. The title should be a simple statement of intent.  Here, we seek annihilation of all NMock calls in our unit tests. 

In the middle is a grid, with a box for each step we need to take to win. In this case, we have a box for each test class which includes the NMock library.

Anyone gets to work on the bowling card, whenever the time is right. As a team you can decide when this should be.  For this card the rules were that if you find yourself maintaining or extending an NMock based class, take the time to fix it by rewriting the tests to use Rhino mocks, or no mock at all.

There may or may not be a deadline.  For this card, we don't have a deadline.  There is no pressing need to replace all the NMock calls, only those in code hotspots.

You may or may not schedule the card to be played.  For this card we didn't schedule work - there is no real business justification to do that.  For another technical card which tracked removing inappropriate methods from a very large legacy repository, we were so slowed down by the problem that we did 'play' the card whenever we had capacity during an iteration.  In this case there was clear business value completing the card would not only speed us up, but would allow bigger architectural refactorings to take place later on in the project.

The boxes are also very important.  Each box is a step towards the goal.  We take the steps in two stages.

Spares and Strikes

When a developer or a pair have fixed one of the steps locally, they get to draw a single line over the box, or a 'spare'.  It is not converted to a 'strike' until the code change is checked in, built and tested if appropriate. We have found this to be really useful.  The spare is a reminder to check in!  With a lot of these tasks, we found it was easy to get carried away and bite off too much.  Invariably this led to trouble, broken code and despair, leading to reverting the code and feelings of hopelessness.  Once a pair has more than a couple of spares on the board, they start to feel the pressure to 'bank' and tend to check in more frequently.

We really like our bowling cards, they are great to remind us to make the world better, and to celebrate progress, no matter how small it feels at times.  In fact Bernardo and Sinan started another card today to track legacy classes with inadequate unit test coverage - and there are a lot of boxes on that sucker!

complete-cardCompleting a bowling card is a celebration event.  Navya was so happy, she made our first one a party hat! Huzzah!

Bowling cards have helped us improve our code.  They remind us that we chose to fix something, rather than allow our code to tip. It is going to potentially take a long time,  but we can all help by getting strikes when we have the chance.  We also have a measure of progress that we can track, and proof that we are making things better one step at a time.

Saturday, November 08, 2008

ActiveRecord lessons learnt: #2 Use lazy loading and batching to improve performance

By default Castle ActiveRecord eagerly loads your object data.  For a small set of connected objects, this default behaviour is fine.  The problems start when it is possible to navigate from the object that you have loaded to other potentially large object graphs.  As an example let's imagine that we are writing some code to track books in our house.  Say I want to track which books I have by author, along with the bookshelf that I'm keeping them on.  I'd also like to track chapters of interest and which editions of the book that I have.  I also want to tag my bookshelves with keywords describing the types of books on the shelf. 

image

The Bookshelf and the Author classes both contain collections of Books.  Nominally, the Author owns the books and the shelf has a reference to the object.  I've also decided that I need to navigate bi-directionally between the book and the author classes.   Here's a basic implementation using ActiveRecord.

using System.Collections.Generic;
using Castle.ActiveRecord;

namespace BooksAndStuff.Models
{

[ActiveRecord(
"Bookshelf")]
public class Bookshelf
{
[PrimaryKey(PrimaryKeyType.HiLo)]
public int Id { get; set; }

[HasMany(
typeof(string), Table = "Tags", ColumnKey = "BookShelfId",
Cascade
= ManyRelationCascadeEnum.AllDeleteOrphan, Element = "TagName",
Lazy
= Switches.LazyCollections, BatchSize = Switches.BatchSize)]
public IList<string> Tags { get; set; }

[HasMany(
typeof(Book), Table = "Books", ColumnKey = "BookShelfId",
Cascade
= ManyRelationCascadeEnum.None,
Lazy
= Switches.LazyCollections, BatchSize = Switches.BatchSize)]
public IList<Book> Books { get; set; }
}

[ActiveRecord(
"Books")]
public class Book
{
public Book() {}

public Book(string title, Author author, string isbn,
IList
<string> chapters, IList<string> editions) {
Title
= title;
Author
= author;
Chapters
= chapters;
ISBN
= isbn;
Editions
= editions;
}

[PrimaryKey(PrimaryKeyType.HiLo)]
public int Id { get; set; }

[Property(NotNull
= true)]
public string Title { get; set; }

[HasMany(
typeof(string), Table = "Editions", ColumnKey = "BookId",
Access
= PropertyAccess.Property, Element = "Edition",
Lazy
= Switches.LazyCollections, BatchSize = Switches.BatchSize)]
public IList<string> Editions { get; set; }


[HasMany(
typeof(string), Table = "Chapters", ColumnKey = "BookId",
Access
= PropertyAccess.Property, Element = "Chapter",
Lazy
= Switches.LazyCollections, BatchSize = Switches.BatchSize)]
public IList<string> Chapters { get; set; }

[Property(NotNull
= true)]
public string ISBN { get; set; }


[BelongsTo(NotNull
= true)]
public Author Author { get; set; }
}


[ActiveRecord(
"Authors", Lazy = Switches.LazyAuthor, BatchSize = Switches.AuthorBatchSize)]
public class Author
{
[PrimaryKey(PrimaryKeyType.HiLo)]
public virtual int Id { get; set; }

[Property(NotNull
= true)]
public virtual string Name { get; set; }

[HasMany(Cascade
= ManyRelationCascadeEnum.AllDeleteOrphan,
Lazy
= Switches.LazyCollections, BatchSize = Switches.BatchSize)]
public virtual IList<Book> Books { get; set; }

}

public static class Switches
{
public const bool LazyCollections = false;
public const bool LazyAuthor = false;
public const int BatchSize = 1;
public const int AuthorBatchSize = 1;
}
}

I wouldn't normally have a switches class, but it is very useful for the next part.  We are going to change the lazy loading and batch size properties of the active record attributes to understand how it influences the way NHibernate fetches the data from the database.  To get a feel for the behaviour of NHibernate, we will need some test data.  Say that we have a single shelf with 3 books on it, each with a different author.  We have another 2 books by one of our authors, but they are not on a shelf (I lent them to Steve). 


To test this we need to start up active record:

[TestFixtureSetUp]
public void FixtureSetUp()
{
var connectionString
= @"Data Source=localhost;initial catalog=BooksAndStuff;
user=BooksAndStuff_user;password=Password01
";
var properties
= new Dictionary<string, string>
{
{
"show_sql", "true"},
{
"connection.driver_class", "NHibernate.Driver.SqlClientDriver"},
{
"dialect", "NHibernate.Dialect.MsSql2005Dialect"},
{
"connection.provider", "NHibernate.Connection.DriverConnectionProvider"},
{
"connection.connection_string", connectionString}
};

var source
= new InPlaceConfigurationSource();
source.Add(
typeof(ActiveRecordBase), properties);
ActiveRecordStarter.Initialize(source,
typeof(BookShelf).Module.GetTypes());
ActiveRecordStarter.CreateSchema();
}

[SetUp]
public void SetUp()
{
ActiveRecordStarter.CreateSchema();
}

This method creates this in the database:

private void GenerateTestData() {
var lewisCarol
= new Author {Name = "Lewis Carol"};
var xmasCarol
= new Author {Name = "Xmas Carol"};
var luisLuisCarol
= new Author {Name = "Luis Luis Carol"};

var chapters
= new List<string>{ "Chapter 1", "Chapter 2" };
var editions
= new List<string>{"hard back", "spanish"};

var book1
= new Book("Alice in Wonderland", lewisCarol,"xxx-yyy", chapters, editions);
var book2
= new Book("Alice in Wonderland II - Revenge of the Cheshire Cat", xmasCarol,"xxx-yyy", chapters, editions);
var book3
= new Book("Not well known",lewisCarol, "xxx-yyy", chapters, editions);
var book4
= new Book("Not well known either", lewisCarol, "xxx-yyy", chapters, editions);
var book5
= new Book("Some book",luisLuisCarol,"xxx-yyy", chapters, editions);

lewisCarol.Books
= new List<Book> {book1, book3, book4};
xmasCarol.Books
= new List<Book>{book2};
luisLuisCarol.Books
= new List<Book>{ book5 };

var bookshelf
= new Bookshelf
{
Tags
= new List<string>{"Fantasy", "Old"},
Books
= new List<Book> {book1, book2, book5}
};

using (new SessionScope())
{
ActiveRecordMediator
<Author>.Save(lewisCarol);
ActiveRecordMediator
<Author>.Save(xmasCarol);
ActiveRecordMediator
<Author>.Save(luisLuisCarol);
ActiveRecordMediator
<BookShelf>.Save(bookshelf);
}
}

Now we have everything in place, we can write a test to load in the data:

[Test]
public void PlayingWithBatchFetchingAndLazyLoading()
{
GenerateTestData();

using (new SessionScope())
{
Console.WriteLine(
"*******************");
Console.WriteLine(
"Batch Size:{0} LazyCollections:{1} LazyAuthor:{2}",
Switches.BatchSize, Switches.LazyCollections, Switches.LazyAuthor);

Console.WriteLine(
"***\nLoading my book shelves");
var shelves
= ActiveRecordMediator<Bookshelf>.FindAll();

Console.WriteLine(
"***\nCounting books on shelf: ");
Console.WriteLine(
"shelves[0].Books.Count =" + shelves[0].Books.Count);

Console.WriteLine(
"***\nChecking I have the right book");
Console.WriteLine(
"\tshelves[0].Books[0].Id:" + shelves[0].Books[0].Id);

Console.WriteLine(
"***\nCounting the book's chapters");
Assert.That(shelves[
0].Books[0].Chapters.Count, Is.EqualTo(2));

Console.WriteLine(
"***\nHow many books does the first book's author have to his name?");
Assert.That(shelves[
0].Books[0].Author.Books.Count, Is.EqualTo(3));

Console.WriteLine(
"***\nHow many books does the third book's author have to his name?");
Assert.That(shelves[
0].Books[2].Author.Books.Count, Is.EqualTo(1));

Console.WriteLine(
"***\nHow many books does the second book's author have to his name?");
Assert.That(shelves[
0].Books[1].Author.Books.Count, Is.EqualTo(1));

Console.WriteLine(
"***\nWhat is the count of the chapters in a given book?");
Assert.That(shelves[
0].Books[0].Author.Books[1].Chapters.Count, Is.EqualTo(2));

Console.WriteLine(
"***\nHow many chapters does the third book have?");
Assert.That(shelves[
0].Books[2].Chapters.Count, Is.EqualTo(2));
}
}

When you run this, you should see the following output from NHibernate:

 *******************
Batch Size:1 LazyCollections:False LazyAuthor:False
***
Loading my book shelves
NHibernate: SELECT this_.Id as Id0_0_ FROM BookShelf this_
NHibernate: SELECT books0_.BookShelfId as BookShel5___2_, books0_.Id as Id2_, books0_.Id as Id2_1_, books0_.Title as Title2_1_, books0_.ISBN as ISBN2_1_, books0_.Author as Author2_1_, author1_.Id as Id5_0_, author1_.Name as Name5_0_ FROM Books books0_ inner join Authors author1_ on books0_.Author=author1_.Id WHERE books0_.BookShelfId=@p0; @p0 = '98305'
NHibernate: SELECT chapters0_.BookId as BookId__0_, chapters0_.Chapter as Chapter0_ FROM Chapters chapters0_ WHERE chapters0_.BookId=@p0; @p0 = '65541'
NHibernate: SELECT editions0_.BookId as BookId__0_, editions0_.Edition as Edition0_ FROM Editions editions0_ WHERE editions0_.BookId=@p0; @p0 = '65541'
NHibernate: SELECT books0_.Author as Author__1_, books0_.Id as Id1_, books0_.Id as Id2_0_, books0_.Title as Title2_0_, books0_.ISBN as ISBN2_0_, books0_.Author as Author2_0_ FROM Books books0_ WHERE books0_.Author=@p0; @p0 = '32771'
NHibernate: SELECT chapters0_.BookId as BookId__0_, chapters0_.Chapter as Chapter0_ FROM Chapters chapters0_ WHERE chapters0_.BookId=@p0; @p0 = '65540'
NHibernate: SELECT editions0_.BookId as BookId__0_, editions0_.Edition as Edition0_ FROM Editions editions0_ WHERE editions0_.BookId=@p0; @p0 = '65540'
NHibernate: SELECT books0_.Author as Author__1_, books0_.Id as Id1_, books0_.Id as Id2_0_, books0_.Title as Title2_0_, books0_.ISBN as ISBN2_0_, books0_.Author as Author2_0_ FROM Books books0_ WHERE books0_.Author=@p0; @p0 = '32770'
NHibernate: SELECT chapters0_.BookId as BookId__0_, chapters0_.Chapter as Chapter0_ FROM Chapters chapters0_ WHERE chapters0_.BookId=@p0; @p0 = '65537'
NHibernate: SELECT editions0_.BookId as BookId__0_, editions0_.Edition as Edition0_ FROM Editions editions0_ WHERE editions0_.BookId=@p0; @p0 = '65537'
NHibernate: SELECT books0_.Author as Author__1_, books0_.Id as Id1_, books0_.Id as Id2_0_, books0_.Title as Title2_0_, books0_.ISBN as ISBN2_0_, books0_.Author as Author2_0_ FROM Books books0_ WHERE books0_.Author=@p0; @p0 = '32769'
NHibernate: SELECT chapters0_.BookId as BookId__0_, chapters0_.Chapter as Chapter0_ FROM Chapters chapters0_ WHERE chapters0_.BookId=@p0; @p0 = '65539'
NHibernate: SELECT editions0_.BookId as BookId__0_, editions0_.Edition as Edition0_ FROM Editions editions0_ WHERE editions0_.BookId=@p0; @p0 = '65539'
NHibernate: SELECT chapters0_.BookId as BookId__0_, chapters0_.Chapter as Chapter0_ FROM Chapters chapters0_ WHERE chapters0_.BookId=@p0; @p0 = '65538'
NHibernate: SELECT editions0_.BookId as BookId__0_, editions0_.Edition as Edition0_ FROM Editions editions0_ WHERE editions0_.BookId=@p0; @p0 = '65538'
NHibernate: SELECT tags0_.BookShelfId as BookShel1___0_, tags0_.TagName as TagName0_ FROM Tags tags0_ WHERE tags0_.BookShelfId=@p0; @p0 = '98305'
***
Counting books on shelf: 
shelves[0].Books.Count =3
***
Checking I have the right book
    shelves[0].Books[0].Id:65537
***
Counting the book's chapters
***
How many books does the first book's author have to his name?
***
How many books does the third book's author have to his name?
***
How many books does the second book's author have to his name?
***
What is the count of the chapters in a given book?
***
How many chapters does the third book have?

Blimey - that was a lot of SQL!  16 Select statements.  Also, notice how it was all executed on the initial load of the Bookshelf.  The logic for loading ran something like this:

  1. Load all the data from the Bookshelf table for all the bookshelves
  2. For each bookshelf 
    1. Load in all the data from the books and authors table for each book on the shelf and its corresponding author
    2. For each book on the shelf 
      1. Load all the Chapters for the first book on the shelf
      2. Load all the editions for the first book in the shelf
    3. For each author 
      1. Load all the data from the books table for the author
      2. Repeat the book loading logic for every book belonging to the author which was not on the shelf
    4. Load in all the tags for the shelf

All that so that we can ask for the first shelf.  We haven't even begun to touch the object yet.  Why did NHibernate do this?  All of these objects can theoretically be reached from the bookshelf.  Because we may want to touch any of these objects after we have retrieved the BookShelf from the database, NHibernate decided it better go and grab the lot so that we could get to these objects if we had to.  This is eager loading, and is the default behaviour of ActiveRecord unless you explicitly instruct it otherwise.  Notice also, that we have an N+1 selects problem too.  There is a single select statement being issued for every collection for every object.  This is a a very expensive way to fetch data from the database.

The key to optimising the fetch behaviour is how you use the the Lazy and BatchSize properties of the various ActiveRecord attributes.

Removing N+1 Select with BatchSize

The BatchSize property can be used on collections or on ActiveRecord classed directly.  When you use the setting on a collection, NHibernate will attempt to populate the collection for [BatchSize] of the objects which NHibernate is aware of an has not populated yet.

We'll start by adding batching

public static class Switches
{
public const bool LazyCollections = false;
public const bool LazyAuthor = false;
public const int BatchSize = 2;
public const int AuthorBatchSize = 10;
}

Batch Size:2 LazyCollections:False LazyAuthor:False
***
Loading my book shelves 
SELECT this_.Id as Id3_0_ FROM BookShelf this_
SELECT books0_.BookShelfId as BookShel5___2_, books0_.Id as ...
SELECT chapters0_.BookId...WHERE chapters0_.BookId in (@p0, @p1); @p0 = '65541', @p1 = '65540'
SELECT editions0_.BookId...WHERE editions0_.BookId in (@p0, @p1); @p0 = '65541', @p1 = '65540'
SELECT books0_.Author...WHERE books0_.Author in (@p0, @p1); @p0 = '32771', @p1 = '32770'
SELECT chapters0_.BookId...WHERE chapters0_.BookId=@p0; @p0 = '65537'
SELECT editions0_.BookId...WHERE editions0_.BookId=@p0; @p0 = '65537'
SELECT books0_.Author...WHERE books0_.Author=@p0; @p0 = '32769'
SELECT chapters0_.BookId...WHERE chapters0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
SELECT editions0_.BookId...WHERE editions0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
SELECT tags0_.BookShelfId...WHERE tags0_.BookShelfId=@p0; @p0 = '98305'
***

That improved the behaviour somewhat.  We're down to 11 statements. NHibernate is now resolving the objects in the collections 2 at a time.  This is not very useful as we typically have more than 2 books on a shelf, so we have 2 trips round the loop to load in all the data for our shelf.  If we up the BatchSize to 5 we can get everything in 7 statements:

Batch Size:5 LazyCollections:False LazyAuthor:False
***
Loading my book shelves
SELECT this_.Id as Id3_0_ FROM BookShelf this_
SELECT books0_.BookShelfId...WHERE books0_.BookShelfId=@p0; @p0 = '98305'
SELECT chapters0_.BookId..WHERE chapters0_.BookId in (@p0, @p1, @p2); @p0 = '65541', @p1 = '65537', @p2 = '65540'
SELECT editions0_.BookId...WHERE editions0_.BookId in (@p0, @p1, @p2); @p0 = '65541', @p1 = '65537', @p2 = '65540'
SELECT books0_.Author...WHERE books0_.Author in (@p0, @p1, @p2); @p0 = '32771', @p1 = '32769', @p2 = '32770'
SELECT chapters0_.BookId...WHERE chapters0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
SELECT editions0_.BookId...WHERE editions0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
SELECT tags0_.BookShelfId...WHERE tags0_.BookShelfId=@p0; @p0 = '98305'
***

Bear in mind that the BatchSize only influences how many of the objects which will be populated in the current known set of objects.  In the example above, even though we only have 5 books we still have make two trips to the database to load all the books in.  The first time NHibernate decides it needs to load in a Book is when it loads the shelf.  At this point in time, there are only 3 known books so they are fetched.  Later on, while loading in the authors, the 2 remaining books are identified and fetched in a second batch.

Using Lazy Loading

The biggest problem with the fetch patterns above is the large amount of data being retrieved which may not be required.  This is potentially very wasteful.  Lazy loading allows you to control this.  With our current object model, it probably doesn't make sense to always pull in the author (and the related books of the author) every time we load a bookshelf.  How about we look at the results of lazy loading the author.

public static class Switches
{
public const bool LazyCollections = false;
public const bool LazyAuthor = true;
public const int BatchSize = 5;
public const int AuthorBatchSize = 10;
}

*******************
Batch Size:5 LazyCollections:False LazyAuthor:True
***
Loading my book shelves
SELECT this_.Id as Id3_0_ FROM BookShelf this_
SELECT books0_.BookShelfId... WHERE books0_.BookShelfId=@p0; @p0 = '98305'
SELECT chapters0_.BookId ... WHERE chapters0_.BookId in (@p0, @p1, @p2); @p0 = '65541', @p1 = '65537', @p2 = '65540'
SELECT editions0_.BookId ... WHERE editions0_.BookId in (@p0, @p1, @p2); @p0 = '65541', @p1 = '65537', @p2 = '65540'
SELECT tags0_.BookShelfId ... WHERE tags0_.BookShelfId=@p0; @p0 = '98305'
***
Counting books on shelf: 
shelves[0].Books.Count =3
***
Checking I have the right book
    shelves[0].Books[0].Id:65537
***
Counting the book's chapters
***
How many books does the first book's author have to his name?
SELECT author0_.Id ... WHERE author0_.Id in (@p0, @p1, @p2); @p0 = '32769', @p1 = '32770', @p2 = '32771'
SELECT books0_.Author ... WHERE books0_.Author in (@p0, @p1, @p2); @p0 = '32771', @p1 = '32769', @p2 = '32770'
SELECT chapters0_.BookId ... WHERE chapters0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
SELECT editions0_.BookId ... WHERE editions0_.BookId in (@p0, @p1); @p0 = '65539', @p1 = '65538'
***
How many books does the third book's author have to his name?
***
How many books does the second book's author have to his name?
***
What is the count of the chapters in a given book?
***
How many chapters does the third book have?

This has dramatically changed the way we retrieve data from the database.  Now we do not load any of the books which are not on the shelf until we touch the author property of the book.  The initial load is much cheaper, although we still pull in a load of data which is not required.  How about we lazy load the collections as well?

*******************
Batch Size:5 LazyCollections:True LazyAuthor:True
***
Loading my book shelves
NHibernate: SELECT this_.Id as Id3_0_ FROM BookShelf this_
***
Counting books on shelf:
SELECT books0_.BookShelfId... WHERE books0_.BookShelfId=@p0; @p0 = '98305'
shelves[0].Books.Count =3
***
Checking I have the right book
shelves[0].Books[0].Id:65537
***
Counting the book's chapters
SELECT chapters0_.BookId... WHERE chapters0_.BookId in (@p0, @p1, @p2); @p0 = '65537', @p1 = '65540', @p2 = '65541'
***
How many books does the first book's author have to his name?
SELECT ... WHERE author0_.Id in (@p0, @p1, @p2); @p0 = '32769', @p1 = '32770', @p2 = '32771'
SELECT books0_.Author... WHERE books0_.Author in (@p0, @p1, @p2); @p0 = '32769', @p1 = '32770', @p2 = '32771'
***
How many books does the third book's author have to his name?
***
How many books does the second book's author have to his name?
***
What is the count of the chapters in a given book?
NHibernate: SELECT chapters0_.BookId... WHERE chapters0_.BookId in (@p0, @p1); @p0 = '65538', @p1 = '65539'
***
How many chapters does the third book have?

As you can see, the up front cost of loading the bookshelves is now very cheap, we are progressively loading in data as we touch more and more of the object graph.  I would suggest that this is generally the best behaviour.  In this case we never retrieve the tags or editions from the database because we don't touch them.  If another method wants to load a bookshelf to edit its tags, we won't incur the cost of loading all the books on the shelf (and there could be 100's of those).

In summary, I'd recommend that you default to setting Lazy loading and batch fetching for all collections and consider lazy loading large aggregate classes - such as the Author in this example, so that you don't inadvertently pull out huge chunks of the database.  Also, keep watching your SQL output by setting show_sql to true now and again.  Quite innocent seeming changes to your domain model can quickly magnify the number of objects that can be navigated to from another domain object.  If you don't have a Lazy constraint between your objects to act as an NHibernate firebreak, then you will cause the amount of SQL issued to increase dramatically.

Don't forget that lazy loading can only work when you remain inside the hibernate session.  If you detach your object from the session then you will get a LazyInitializationException thrown when you attempt to access a collection of object that has not been initialised.

Wednesday, November 05, 2008

ActiveRecord lessons learnt: #1 Never forget there's a database

We have been using Castle's Active Record very heavily on my current project we have recently spent a lot of time performance tuning the performance of the application and its use of ActiveRecord.  We learnt a lot over the last few of weeks and in the next few posts I'll try to capture some of these lessons.

ActiveRecord is a great abstraction of the database for a .NET application.  NHibernate and Hibernate are OK, but leave you with a legacy of XML mapping files in your application, which is no fun at all.  The appeal of ActiveRecord over vanilla NHibernate was the chance to shed all that XML cruft.  Now, with a minimal few simple attributes and a simple repository class I can transform my plain old C# object into a persistent domain object with ease.  It is almost as if the database was not there at all...

[ActiveRecord]
public class MyAggregate
{
private int id;
private readonly IList<SomeObject> items;

[PrimaryKey(Access
= PropertyAccess.FieldCamelcase, UnsavedValue = Entity.IdUnassignedString)]
public int Id{get { return id; } }

[HasMany(
typeof(SomeObject), Access=PropertyAccess.FieldCamelCase)]
public IList<SomeObject> SomeItems{ get { return items; } }
}

Look at that!  Just three Attributes and my object is ready for the database.  Marvelous!

This has been the source of many problems for us.  It was just too easy to work with the database.  In the green field days of the project, you could almost forget that the database existed at all.  We had ActiveRecord auto generate the database, and we wrote test fixtures which could create test data in memory or in the database.  This was superb, as we could write standalone unit tests to drive out the behaviour of the domain and application components, soon we had hundreds of fast database free unit tests.  We could also write wired tests for our repositories to prove that we could persist domain objects to and read them back from SQL server.  The same test fixtures could be reused in the automated acceptance tests to modify the application data as required.

Fantastic!  So what went wrong?  Well, we built a slow app, at least it started out slow when it hit UAT (it's much, much faster now).  We fell victim to Joel's Law of Leaky Abstractions which states that "All non-trivial abstractions, to some degree, are leaky".  In this case, we have ActiveRecord abstracting NHibernate which in turn abstracts the database access.  The minute you forget this, you will get bitten.  The Hibernate team certainly acknowledge this.   Just three pages into the first chapter of Java Persistence with Hibernate Christian Bauer and Gavin King remind you that:

"To use Hibernate effectively, a solid understanding of the relational model and SQL is a prerequisite.  You need to understand the relational model and topics such as normalization to guarantee the the integrity of your data, and you'll need to use your knowledge of SQL to tune the performance of your Hibernate application" 

In other words, try to remember that you are still talking to a relational database; we're just making it easier for you to do so from your OO world.

I really like ActiveRecord and I would recommend using it on .NET projects when there is the need an OR Mapping tool.  The fault lay was us, the developers.  We didn't pay enough attention to the fact that there was a database at the end of the calls that the repository was making.  We didn't spot that some of the calls which our test code was making were potentially very expensive in real usage scenarios.  And so we got bitten by the law of leaky abstractions, which manifest itself through a number of bad usage patterns which caused immediate performance problems.

Avoidable Mistake - Absence of good quality test data

We took too long time to recognise we had performance problems because we didn't work hard enough to obtain truly representative test data for our app.  If we had made more of an effort to use 'real' data sooner, then we would have spotted much more, much earlier in our project's lifecycle.  There is no substitute for realistic data.  Get or generate some and plan to integrate it into your application build process.  Make it easy for developers to use realistic data.  We now have a nant target which restores a UAT or production database onto our developer machines and upgrades it to the latest version using dbdeploy.NET.  This is great as it means we can also automate testing our data and schema migration scripts with a production-like database.

Avoidable Mistake - Adoption of the "select all and filter with a bit of LINQ" anti pattern

This is a real killer.  The code snippet below illustrates the problem

public Book FindBooksWitTitle(string title)
{
return repository.AllBooks().Select(book=>book.Title == title).FirstOrDefault();
}

This is fine from a C# perspective, and is nice, clean, expressive code.  The problem is that we had to pull all the books out of the database to run the query.  Fine with a small test database, but very bad with the Amazon book catalogue.

Provide a richer API through the repository instead and push the filtering to the database:

public class BookRepository : IBookRepository
{
//Snip

public IList<Book> GetFirstBookWithTitle(string title)
{
DetachedCriteria crits
= DetachedCriteria.For(typeof(Book));
crits.Add(Expression.Eq(
"Title", title));
return ActiveRecordMediator<Book>.FindFirst(crits);
}

//Snip
}

This is a specific example of a general problem - making the application do work it doesn't need to do.

I'd generally be very suspicious of repositories which return unfiltered lists of any big aggregates.  They can be very useful for test purposes, but can really kill you if they get into the production code.  Keep an eye out for filtering being performed in C# code which would be simpler and more efficient as a relational database query.  These tasks should be pushed down into your repository classes and rewritten in HQL or Criteria queries.

Avoidable Mistake - Loading in data that you don't need

Bear in mind that calls to the database are expensive, especially if said database is on a remote machine.  Every time you go to the database for something that you don't need, you are going to slow down your app and needlessly waste resources.  Here are some examples of mistakes we made:

  • Loading objects from the database to perform unnecessary validation.
  • Eagerly loading objects on page load "just in case" the app might use them.
  • Repeated loads of objects from the database due to page lifecycle events.
  • Loading aggregates when a report query or a summary object would be better

Avoidable Mistake - Forgetting to check the SQL being generated

Make sure you check the SQL statements being used by NHibernate to satisfy your requests.  Sometimes you will get quite a surprise.  To enable logging of the SQL statements to the console when running your tests, add 

<add key="show_sql" value="true" />

to the activerecord section in your app.config file .   If you want to log application output, Ken Egozi has a post showing you how to do it here.

Friday, September 05, 2008

Dogfooding NSynthesis

Earlier this year George and I came up the with idea of trying to join up the mock interactions in our unit tests to improve the confidence that our tests were coherent and that all our mocked interactions hung together. The result of this was Synthesis.

The only problem was that I'm working on .NET projects at the moment. Enter NSynthesis.

The aim of NSynthesis is to provide 'code coverage for mocked interactions for .NET'. It uses PostSharp to detect mock expectations being set by your test code and verifies that there is a test of the real production implementation in the same test run. If this call is missing, we fail the build. Alex and Bernardo and I are busy working on getting NSynthesis to the point where we can have a 0.0.1 release and it is looking quite promising now. Out of curiosity, I thought I'd grab the trunk build and try to run NSynthesis on a subset of my current project.

I disabled the generic support as we are still working on this and it is not really stable yet. I also filtered the classes which we are using to only include the our services namespace. Services should be easy to unit test, as there is always a repository or another service between them and the real world. We also find this is where we have the most mocked interactions.

The results. 54 unit tests analysed, 12 unique mock calls made, 6 unique untested mock calls!
That's 50% of the mock interactions not being unit tested! Blimey!

So. There is definitely value in using this tool! It is not that this code is untested - there are lots of wired tests and acceptance tests. However, what it highlighted was that we could have tested a lot more code at a lower level, and possibly would have sped up our test suite as a result. It was a surprise for the team too - we all thought we were already unit testing everywhere we could.

When I extended NSynthesis to run against the whole code base, I ran into another problem. What broke NSynthesis was the situation where a class 'acquires' an implementation of an interface.

image

Here, my repository does not actually implement Exists itself, rather it derives from an existing implementation. In the code, this was our own BaseRepository, which in turn inherited from Castle's ARRepository<T>. Now when I mock the repository, I mock the interface. When I test my repository, I'm going to test the Repository class itself. NSynthesis then completely failed to realise that I have a tested implementation for IRepository because it is looking for a concrete implementation by a class which implements the interface IRepository. And BaseRepository does not.

I think getting this working could be fun :-)

Tuesday, April 01, 2008

Synthesis comes to Manchester

Following on from our presentation to EuRuKo 2008, George and I will be speaking at the next Thoughtworks Manchester Geek Night on the 8th April.

I'm an alumnus of UMIST - so I'm really looking forward to seeing the old place again, it has been a long time since I was last in Manchester!

Saturday, March 22, 2008

Synthesis and test confidence

George and I spoke to a bunch of friends about Synthesis a few weeks ago at the inaugural Reading Geek Night. Around six of us met in a pub in Reading and got really confused looks from the punters looking on on us as we waxed enthusiastically about TDD concepts and how we thought that they could be improved. I found it very interesting to get feedback from a group of really smart non ThoughtWorkers . They don't all practice “full on agile” but they certainly do understand what it means to build and ship working code, on time, with tight business pressures. It has taken a couple of weeks to digest the feedback, hence the slow time to post about the event.

So, here is another attempt to explain why we may need synthesis on our projects in the light of Reading Geek Night #1 and previous discussions with the guys at ThoughtWorks UK previous to this....

If you practice TDD, then the chances are that you already have a large number of unit tests. You may have a bunch of other automated tests of different types as well (functional, integration, performance...), and if you do, that's great. Keep doing that.

At the other end of the spectrum, you will also have some form of acceptance testing structure in place. The format of this varies from project to project. It could be a set of manual scripts which your Testing/QA team/Nominated Guy run, or it could be a bunch of system tests using FIT or one of the BDD frameworks. Maybe you paid for a commercial product and have something like a bunch of Rational Robot scripts.

We currently have a large disconnect between our high level tests and our low level unit tests, especially when you consider how confident you would be that the system works as desired by running one or more if the different types of test in the system.

At the bottom of the scale there is a system which has no tests. I would not be at all confident that this system worked. At the top end of the scale, I have a system which is running live in the real production environment and is being used by the actual user base. I am very confident that this system is working. At some point in the past, most businesses came to the conclusion that IT cannot blindly put systems live to discover if they work, hence all the interest in testing techniques.

High level system tests give us much more confidence that a system may work in production than unit tests because of the simple fact that they are exercising the production code in a more realistic manner; the data is more real, and all the interactions between the components are 'real', or as close to real as we can get in a test environment.

Unit tests provide a much weaker level of confidence due a number of issues when you want to use them to prove that a system works.

Unit tests are incomplete
We cannot unit test all of our code. Note that good design practices can massively reduce the amount of 'untestable' code in the system and by practicing TDD and running code coverage tools it is simple to provide a view of how well we are doing. However, there are always parts of the system which you either cannot test or neglect to unit test for some reason.

Unit tests are disconnected from other unit tests
Unit tests test a small amount of code, by definition, and often in isolation. If you are testing interactions between components then these are simulated using mocks. How can I be confident that I got my mock interactions correct? Also, how can I be confident that I have tested or even completed coding up the real version of whatever it is I am mocking? Again, we can mitigate this by reducing the number of interaction based unit tests and by writing state based unit tests when we can. However sometimes it is much more natural to follow the interaction test approach, and this incurs a risk that we are not simulating our interactions accurately.

Please don't take away from this that unit tests are bad. Unit tests are the lifeblood of a healthy project and are great for proving that the individual cogs are properly machined. What they do not tell you is whether you have the right number of cogs, or if they all fit together properly. If I want to know that components A,B, and C really play well together, then I need to write another aggregate test which puts ABC together and validates the unit tests got the interactions correct in a more realistic environment. George and I would call these functional tests, but you may use another term on your project. You may not have these tests on your project either – and that could be more worrying still – this means there could be gaps in your coverage at the levels that developers work at.

So, if I have functional tests plus unit tests, I am more confident that my system is working before I commit to running the slow system tests. B y running functional tests I have confidence that I have wired up all my well behaved units of code in a meaningful manner and that they are playing well with each other as I predicted. The chances are, that the wider system is going to work. I still am not as confident that the system works as I will be when my acceptance test suite is run by [insert machine/human here], but I can probably sleep well enough.

Now, some functional tests are inescapable and add a view into an aspect of the system that a unit test just cannot provide. A great example if you are using Spring would be to prove that you can get your components from the container and that they are wired up correctly. However, if you are purely proving that your simulated interactions are valid in the unit tests, then repetition is creeping into the process - which is wasteful.

This is where Synthesis comes in. Synthesis monitors the simulated interactions which you create in your unit tests and verifies that there is a corresponding unit test that exercises and validates the real object in your system. If everything joins up, then your build passes. If there are disconnects, then the build fails. So, if I have a unit tests for A, B, and C, then it is fine for me to simulate A's interactions with B and C using mocks, providing synthesis can match all my expectations with real tests which make the very same calls to the real A. The result is a synthetic bigger test, where A, B, and C are virtually linked by their expectations.

Obviously, there is a sliding scale of 'matching'. At one end you have basic method name matching, and at the other end of the scale there is full data matching for every call. The closer you get to true data matching, the more confidence is gained, but the more restrictions are placed on developers. We are not sure how far we need to go in order for a team to get an optimal balance between speed and safety – but I'd be very interested to get opinions on this matter :-)

Currently, Synthesis validates the call signature completely, but does not check the content of data passing between the mocks and data used to test an object for real. We plan to add this soon. However, there is a benefit to be had by ensuring that all your interactions join up. This will catch method mismatches, dynamic calls which are not tested such as Active Record queries, and gaps in your test coverage which have simply been missed by those fallible human developers!

Saturday, January 05, 2008

Dear Microsoft

I thought it was very sporting of you to somehow break your MSDN subscriber downloads page for your flagship IE7 web browser:

Broken Download

But not for Firefox:

Firefox ok

Well done! I didn't even realize that the site had any problems at all until I switched over to my virtual machine of XP from Ubuntu in order to download the ISO I needed via your custom file transfer application. Why do you insist that I use this at all? It seems really odd.

Anyway, perhaps I should just try to get your File Transfer Manager to run in Wine and I could have a seamless experience browsing your web site from a proper OS....

Love and hugs,

Stu

Sunday, February 11, 2007

Damn Skippy

From a conversation whilst pairing with a client developer as we attempted to test infect some legacy code last week.

Me: Phew! Well, that little bit is under test now. We can start to fix it up now.

Client Dev: That seemed too hard....you know, if the guys who wrote that had had to put in tests as they went, the API would NEVER have looked like this. It would just have been too painful to work with.


It really is nice to witness the penny drop. True job satisfaction.