Using Approval Tests to Bring Legacy Code Under Test

Ricky Munz
6 min readDec 23, 2024

--

Recently I’ve been practicing some refactoring katas to keep my skills sharp. The latest one that I’ve taken a look at is The Gilded Rose. The crux of this problem is that while you start out with a working implementation, the logic is one giant conditional that’s difficult to read. Unless you’re pretty careful, it’s likely that you could end up making breaking changes. To make things even trickier, there are no existing tests. One option is to take the requirements and start writing unit tests while ensuring they match the existing implementation. But for this problem I wanted something akin to characterization tests before making any changes.

The purpose of characterization testing is to document your system’s actual behavior, not check for the behavior you wish your system had.
— Michael Feathers

A characterization test merely documents the current behavior of code. This technique is particularly handy when working with legacy code that you’re not familiar with and that does not already have tests. To prevent making any breaking changes while tidying the code, you can create tests that describe the existing behavior. The fastest way to add characterization tests that I know of is by using Approval Tests.

Approval tests allow you to take a snapshot of your code’s output so that you can catch any changes to the output that you did not expect. What’s especially convenient is approval test’s ability to compare complex objects or large sets of output data like we’ll see in this example.

This Gilded Rose kata is an excellent example to see this in action. Here’s a look at the code present at the start of the kata:

public class GildedRose {
var items: [Item]

public init(items: [Item]) {
self.items = items
}

public func updateQuality() {
for i in 0 ..< items.count {
if items[i].name != "Aged Brie" && items[i].name != "Backstage passes to a TAFKAL80ETC concert" {
if items[i].quality > 0 {
if items[i].name != "Sulfuras, Hand of Ragnaros" {
items[i].quality = items[i].quality - 1
}
}
} else {
if items[i].quality < 50 {
items[i].quality = items[i].quality + 1

if items[i].name == "Backstage passes to a TAFKAL80ETC concert" {
if items[i].sellIn < 11 {
if items[i].quality < 50 {
items[i].quality = items[i].quality + 1
}
}

if items[i].sellIn < 6 {
if items[i].quality < 50 {
items[i].quality = items[i].quality + 1
}
}
}
}
}

if items[i].name != "Sulfuras, Hand of Ragnaros" {
items[i].sellIn = items[i].sellIn - 1
}

if items[i].sellIn < 0 {
if items[i].name != "Aged Brie" {
if items[i].name != "Backstage passes to a TAFKAL80ETC concert" {
if items[i].quality > 0 {
if items[i].name != "Sulfuras, Hand of Ragnaros" {
items[i].quality = items[i].quality - 1
}
}
} else {
items[i].quality = items[i].quality - items[i].quality
}
} else {
if items[i].quality < 50 {
items[i].quality = items[i].quality + 1
}
}
}
}
}
}

Oh, the humanity! That’s a heck of a lot of branches and indents.

From scanning this file I gleaned that the following:

  • The GildedRose class starts with an array of Items.
  • The updateQuality() method loops through the Items and updates the quality property (and in some cases the sellIn property) of each Item.
  • The algorithm for updating this quality depends on the Item’s name, sellIn (like days until expiration), and current quality.
  • There are several magic strings for specific item names (e.g. “Aged Brie”).
  • There are checks for sellIn and quality values of less than 0 and less than 50.

In order to fully characterize this behavior, it will be necessary to visit each branch of this code. With normal unit tests, this could be quite time consuming as you’d have to go through each individual case. But with approval tests, it’s as easy as this:

@testable import GildedRose
import ApprovalTests_Swift
import Testing


@Test func characterization() async throws {
let names = [
"Other",
"Aged Brie",
"Backstage passes to a TAFKAL80ETC concert",
"Sulfuras, Hand of Ragnaros"
]

let sellInRange = -1...51
let qualityRange = -1...51

var items = [Item]()

for name in names {
for sellIn in sellInRange {
for quality in qualityRange {
items.append(Item(name: name, sellIn: sellIn, quality: quality))
}
}
}

let app = GildedRose(items: items)
app.updateQuality()

try Approvals.verifyAll(app.items)
}

First we collect the magic strings for the names and pertinent ranges for sellIn and quality (I used bounds of -1 and 51 to prevent any off by 1 errors). Next we create a giant list of Items (I count 11,235 items) to account for every permutation to inject into GildedRose and call updateQuality(). Finally we call Approvals.verify(app.items) to verify the output. This is what we end up with:

The result of running an approval test shows up in a diff tool (I’m actually using VSCode here). Essentially here you are manually verifying that this is indeed the expected result of the code. In our case, there’s no real verification to do on our part since we have inherited this code and don’t yet know the exact expected behavior. What we have here is a brutally honest result of the current behavior for every possible conditional branch of the code (100% code coverage of the GildedRose class). Also, you’ll be glad to note that despite the number of loop iterations, the test is quite quick:
✔ Test characterization() passed after 0.039 seconds.

After accepting the changes, we have the following:

Now that we have the approval test and result in place, let’s show how they can catch breaking changes by making one on purpose.

public func updateQuality() {
for i in 0 ..< items.count {
// replacing != with == for "Aged Brie"
if items[i].name == "Aged Brie" && items[i].name != "Backstage passes to a TAFKAL80ETC concert" {
// …
}

After running the test again, here is the result:

With approval tests, if there’s no change to the result, then you know you haven’t broken anything. But when you see a prompt to approve the result like that shown above (a change from the previous result) and you weren’t expecting anything to change, then you know that you’ve introduced a bug. However, if there is some new behavior you add later that you do expect, you can accept the changes and they’ll be added to your approved result to verify the new behavior going forward.

What started off as a frustrating problem of getting an obtuse bit of legacy under test becomes trivial with approval tests. Keep in mind that exhaustive approval tests were used here just to get the legacy code under test quickly. Now we can begin refactoring without fear of breaking existing behavior. Once this code is refactored, we can replace the approval tests here with regular unit tests. Having standard unit tests will aid in documenting the expected functionality. Alternatively, the approval tests could be kept as an extra layer of defense, but only run in certain situations like validating a PR or prior to a merge. Also, the number of redundant cases could be reduced considerably.

Thanks to Emily Bache for hosting the Gilded Rose kata. Thanks to Llewellyn Falco for creating the Approval Tests project. Finally, thanks to Jon Reid for contributing to the Swift version for both the kata and the Approval Tests project.

--

--

No responses yet