Sunday, 13 October 2013

Refactoring legacy code to add tests

Imagine you have some legacy code in your product. Code that's been added to over the years by several different developers, none of who were certain they understood the code. There is no separation of concerns; the code has business logic and database access baked in. There's no useful documentation and the code comments are no more than sarcastic outbursts of cynicism.

A bug report has come in that suggests some of the SQL being executed from this method is invalid. So, what's the plan? Hack the SQL construction around until it feels like it does what you want, hoping you haven't broken anything else in the class? Or take a more structured approach?

To illustrate the problem, I have written some legacy-style code so I can show how to refactor it to make it testable. The techniques here are a pragmatic approach to getting legacy code under test, while making as few untested changes as necessary. Because as you will already know, a class is only able to be refactored when it's under test.

The code for the class is available on GitHub here, and looks like this
using System.Collections;

namespace CombinationOfConcerns
{
    public class Data
    {
        private readonly string _type;

        public Data(string type)
        {
            _type = type;
        }

        public ArrayList GetData(string code, string gdr, string filterType, bool getAge)
        {
            ArrayList results = null;

            var sql = "select id from people where type = " + _type + " and gender ='" + gdr + "'";
            string values = DatabaseWrapper.GetSingleResult(sql);
            for (int i = 0; i < values.Split(',').Length; i++)
            {
                switch (code)
                {
                    case "128855":
                        sql = "select name";
                        if (getAge)
                            sql += ",age";                    
                        sql += ",gender from people where type in(" + values + ")";
                        if (filterType == "45")
                        {
                            sql += " and gender='" + gdr + "'";
                        }
                        if (filterType == "87")
                        {
                            sql += " and shoesize = 43";
                        }
                        results = DatabaseWrapper.GetManyResults(sql);
                        break;
                    case "1493":
                        sql = "select dogs.name, dogbreeds.breed from";
                        sql += " dogs inner join dogbreeds on";
                        sql += " dogs.breed = breed.breed where dogs.ownerId in(" + values + ")";
                        if (filterType == "12")
                        {
                            sql += " and coat='curly'";
                        }
                        results = DatabaseWrapper.GetManyResults(sql);
                        break;
                    default:
                        sql = "select name, population from countries,people on where people.countryname = countries.name and people.id in(" + values + ")";
                        if (filterType == "f")
                        {
                            sql += " and countries.continent='South America'";
                        }
                        if (filterType == "642")
                        {
                            sql += " and countries.continent='Europe'";
                        }
                        results = DatabaseWrapper.GetManyResults(sql);
                        break;
                }
            }
            return results;
        }
    }
}

Run the code

Some great advice from Michael Feathers goes something like this: "if you need to see what some code does, run it." In practice, this means write a test that exercises the code, and see how far it gets without failing. If it doesn't fail, that route through the code has no dependencies and will be easier to write tests for. If it does fail, it will highlight dependencies that need to be abstracted.

To run some code, the first job is to find a suitable entry point to the method. If the method to test is public or internal, that's the entry point. If not, is the method called from a public or internal method? If not, the method should be made internal to allow it to be called directly.

The method on the Data class that needs testing is the public GetData method. Here's the first test:
        [TestFixture]
        public class GetData
        {
            [Test]
            public void WithEmptyArgs()
            {
                var target = new Data("");
                target.GetData("", "", "", false);
            }
        }
When this runs, the GetData method fails on a call to the DatabaseWrapper class, showing that there's a dependency on that class that needs to be faked for testing.
        public ArrayList GetData(string code, string gdr, string filterType, bool getAge)
        {
            ArrayList results = null;

            var sql = "select id from people where type = " + _type + " and gender ='" + gdr + "'";
            string values = DatabaseWrapper.GetSingleResult(sql);
            //Code fails in call to GetSingleResult

Deal with dependencies

If the DatabaseWrapper class was an instance, the best way to deal with it would be to extract its interface and use that interface through the code, allowing a stub to be used from a test. However, in this example DatabaseWrapper is a static class, so needs to be dealt with in a different way.

One way would be to add an instance wrapper class around the static class (DatabaseWrapperWrapper..?). This would call the static method from within that class's instance methods, and allow an interface to be used in the GetData method.

Another way is to add a protected virtual method in the Data class that wraps the static method; this allows the test code to use a subclass of the Data class, and override the virtual method. I'll demonstrate that. Here's the new virtual method and its usage in the GetData method:
        protected virtual string GetSingleResult(string sql)
        {
            return DatabaseWrapper.GetSingleResult(sql);
        }

        public ArrayList GetData(string code, string gdr, string filterType, bool getAge)
        {
            ArrayList results = null;

            var sql = "select id from people where type = " + _type + " and gender ='" + gdr + "'";

            //Now calls virtual method
            string values = GetSingleResult(sql);

and here's the usage in the test project. There's a new class, FakeData, that extends Data. This has an override of the virtual GetSingleResult method, returning an empty string for now. Note that the FakeData class is the new target for the test.
        [TestFixture]
        public class GetData
        {
            [Test]
            public void WithEmptyArgs()
            {
                var target = new FakeData("");
                target.GetData("", "", "", false);
            }

            private class FakeData : Data
            {
                public FakeData(string type)
                    : base(type)
                { }

                protected override string GetSingleResult(string sql)
                {
                    return "";
                }
            }
        }
The code under test now runs up to the call to DatabaseWrapper.GetManyResults, so the GetManyResults method needs a virtual wrapper the same as GetSingleResult did:
        protected virtual ArrayList GetManyResults(string sql)
        {
            return DatabaseWrapper.GetManyResults(sql);
        }
the calls to DatabaseWrapper.GetManyResults need to be changed to the local GetManyResults:
...
    default:
        sql = "select name, population from countries,people on where people.countryname = countries.name and people.id in(" + values + ")";
        if (filterType == "f")
        {
            sql += " and countries.continent='South America'";
        }
        if (filterType == "642")
        {
            sql += " and countries.continent='Europe'";
        }

        // Now calls the virtual method
        results = GetManyResults(sql);
        break;
...
and the FakeData class in the tests needs to override the method:
            private class FakeData : Data
            {
                public FakeData(string type)
                    : base(type)
                { }

                protected override string GetSingleResult(string sql)
                {
                    return "";
                }

                protected override ArrayList GetManyResults(string sql)
                {
                    return new ArrayList();
                }
            }
Now there's a test that mocks a call to the database, and runs through the method from start to finish. But because of the many responsibilities of the code, what needs to be asserted - the SQL that is being executed on the database - is not exposed outside the GetData method.

Sensing variables

In order to see the value of the SQL being executed, the test class will use a "sensing variable". This is a variable that stores a runtime value so that it is available for a test to perform an assertion on. In this example, the sensing variable will be on the FakeData class, and will store the value of the SQL string passed to the GetManyResults method.
        [TestFixture]
        public class GetData
        {
            [Test]
            public void WithEmptyArgs()
            {
                var target = new FakeData("");
                target.GetData("", "", "", false);
                // Use the sensing variable in the assertion
                Assert.That(target.ExecutedSql, Is.EqualTo(""));
            }

            private class FakeData : Data
            {
                // The sensing variable
                public string ExecutedSql { get; private set; }

                public FakeData(string type)
                    : base(type)
                { }

                protected override string GetSingleResult(string sql)
                {
                    return "";
                }

                protected override ArrayList GetManyResults(string sql)
                {
                    //Store the value in the sensing variable
                    ExecutedSql = sql;
                    return new ArrayList();
                }
            }
        }
A well-placed breakpoint allows the sensing variable's value to be harvested, and used as the expected result for that test.

            [Test]
            public void WithEmptyArgs()
            {
                var target = new FakeData("");
                target.GetData("", "", "", false);
                
                const string expected = "select name, population "
                    + "from countries,people on where people.countryname = "
                    + "countries.name and people.id in()";
                
                Assert.That(target.ExecutedSql, Is.EqualTo(expected));
            }
Which gives the first green test!

Although this single test is not particularly useful, the journey to get to it has shown some techniques that are available when tackling legacy code. In future posts I'll write more tests and get to a point where it's safe to refactor the code. The I'll refactor the hell out of it.

In Summary

  1. Run the code
    • Identify the start point
    • Make method visible (public/internal)
    • Write a test to exercise the code
  2. Deal with dependencies
    • Introduce interface for instance class
    • Wrap a static class in an instance class
    • Create virtual method to hide static method calls
  3. Sensing variables
    • Use to surface runtime values


If there are other techniques that you employ to refactor legacy code, I'd like to know what they are - they are all useful.

The before and after code used in this example is on GitHub at orangutanboy/LegacyCodeUnderTest