Have you ever needed to change code that already works, but has no tests and is difficult to fully understand? How did you feel deploying those changes? Were you nervous?
I recently had to upgrade a 7 year old API endpoint with hundreds of lines of code. This was part of a new feature release, and in order to make the changes safely, while also making sure the final code was cleaner and well tested, I made use of some techniques that I'd love to share using a short but interesting example.
For our example, let's say that we mask sensitive data in some places in our product. Our product manager asks us if we can change the way emails are masked so that they look like this:
*****@example.com
Instead of this:
***************.com
We then open our text editor and find the method where this is happening:
def mask(str)
if str.length <= 4
return str
end
# Email
if str =~ URI::MailTo::EMAIL_REGEXP
limit = str.length - 4
return "#{'*' * limit}#{str[limit..-1]}"
end
# Phone Number or ID
if str.is_a?(Numeric)
limit = str.length - 4
return "#{'*' * limit}#{str[limit..-1]}"
end
# String, like a name
limit = str.length - 4
return "#{'*' * limit}#{str[limit..-1]}"
end
The bad news is that we don't have any tests in place, which means we won't know if we're breaking something. How should we move forward?
How to work with legacy code
The first goal is to add tests without changing anything in the code if possible. This kind of test will be useful for two reasons:
We'll better understand the code as it exists before the changes we need to make.
We'll have some tests that will tell us if we break something when we make changes.
Let's create a simple test first.
First test
We can use any testing library we want, but in this case we already have rspec set up in the project so we'll use that.
describe 'mask' do
it "masks regular text" do
expect(mask('simple text')).to eq('*******text')
end
end
We're guessing by looking at the code in the last case where it says "string, like a name" that the whole string except for the last 4 characters will be asterisks. And the test passes!
$ rspec
.
Finished in 0.0026 seconds (files took 0.12025 seconds to load)
1 example, 0 failures
At this point, we might be tempted to do the same thing with other lines of the method by trying to guess beforehand what the result should be.
But instead, we'll use a different technique.
Characterization tests
A characterization test is a technique coined by Michael Feathers where we write tests for code where we don't have them. It also means that, to some extent, we shouldn't be guessing what the code should do and instead we should be in "exploratory" mode.
We should create tests without really knowing what's going to happen and then document those cases. With this approach, our first test could look like this:
it "x" do
expect(mask('simple text')).to eq(nil)
end
And we get this output:
F
Failures:
1) mask x
Failure/Error: expect(mask('simple text')).to eq(nil)
expected: nil
got: "*******text"
(compared using ==)
# ./spec/mask_spec.rb:5:in `block (2 levels) in <top (required)>'
Finished in 0.01882 seconds (files took 0.12916 seconds to load)
1 example, 1 failure
Failed examples:
rspec ./spec/mask_spec.rb:4 # mask x
So now we can "document" this behavior in the test, like we did before but we will use the test output to guide us:
it "masks regular text" do
expect(mask('simple text')).to eq('*******text')
end
Let's add another test.
Email test
Like before, let's use an "X" test:
it "x" do
expect(mask('example@example.com')).to eq(nil)
end
$ rspec
.F
Failures:
1) mask x
Failure/Error: expect(mask('example@example.com')).to eq(nil)
expected: nil
got: "***************.com"
(compared using ==)
# ./spec/mask_spec.rb:9:in `block (2 levels) in <top (required)>'
Finished in 0.03376 seconds (files took 0.34069 seconds to load)
2 examples, 1 failure
Failed examples:
rspec ./spec/mask_spec.rb:8 # mask x
So it seems the behavior is the same. We get asterisks and four characters. Let's change the test:
it "masks an email address" do
expect(mask('example@example.com')).to eq('***************.com')
end
Numeric test
We see in the middle of the method that if we pass a string with a numeric form (str.is_a?(Numeric)
) there's a different "if" condition:
if str.is_a?(Numeric)
limit = str.length - 4
return "#{'*' * limit}#{str[limit..-1]}"
end
Let's see what happens if we pass a number as a string:
it "x" do
expect(mask('635914526')).to eq(nil)
end
$ rspec
..F
Failures:
1) mask x
Failure/Error: expect(mask('635914526')).to eq(nil)
expected: nil
got: "*****4526"
(compared using ==)
# ./spec/mask_spec.rb:13:in `block (2 levels) in <top (required)>'
Finished in 0.02038 seconds (files took 0.17098 seconds to load)
3 examples, 1 failure
Failed examples:
rspec ./spec/mask_spec.rb:12 # mask x
So, again, we can document this behavior:
it "masks numbers as strings" do
expect(mask('635914526')).to eq('*****4526')
end
Adding more characterization tests
So far, we've only covered some very simple cases, so we should include more tests with the same approach:
- Write a failing test "X" to see the result
- Execute it and copy the output
- Update that test to better document the behavior and include the right output
- Now the test passes and we can go back to step 1 for another case.
I'd probably repeat these steps until I'm relatively comfortable with all the cases covered. In this example, I'd probably include cases like:
- Short string with 4 characters
- Short string with 3 characters
- Empty string
- Other formats of strings with symbols like dashes or dots.
After doing this we end up with the following spec file:
require_relative './../mask'
describe 'mask' do
it "masks regular text" do
expect(mask('simple text')).to eq('*******text')
end
it "masks an email address" do
expect(mask('example@example.com')).to eq('***************.com')
end
it "masks numbers as strings" do
expect(mask('635914526')).to eq('*****4526')
end
it "does not mask a string with 4 characters" do
expect(mask('asdf')).to eq('asdf')
end
it "does not mask a string with 3 characters" do
expect(mask('asd')).to eq('asd')
end
it "does not do anything with empty strings" do
expect(mask('')).to eq('')
end
it "masks symbols like regular characters" do
expect(mask('text .-@$ with symbols-')).to eq('*******************ols-')
end
end
Next steps, refactoring.
After documenting the current behavior, it's important to see if we can simplify the code before introducing a change.
Refactoring is now safer since we have tests in place in case we break something.
Refactoring step by step would make this article too long, so I'm going to skip that process and show a potential solution after some steps:
- Extracting the number of visible characters to a constant.
- Simplifying repeated case for numeric strings and regular strings.
- Extract condition to check if a string is an email.
- Move the whole method to a StringUtils class as a static one.
After these refactorings the tests still pass and the code looks like this:
class StringUtils
N_VISIBLE_CHARACTERS = 4
def self.mask(str)
return str unless should_mask(str)
if is_email(str)
mask_email(str)
else
mask_string(str)
end
end
def self.should_mask(str)
str.length > N_VISIBLE_CHARACTERS
end
def self.mask_email(str)
limit = str.length - N_VISIBLE_CHARACTERS
"#{'*' * limit}#{str[limit..-1]}"
end
def self.is_email(str)
str =~ URI::MailTo::EMAIL_REGEXP
end
def self.mask_string(str)
limit = str.length - N_VISIBLE_CHARACTERS
"#{'*' * limit}#{str[limit..-1]}"
end
end
Note that I've left mask_email as a different method because we need to change it now, so it's not going to be the same code we have for the other case.
I would probably clean up some tests, since now some cases are repeated such as numeric strings and regular strings, but it's not necessary for the purpose of this article.
Introducing our change with TDD
Finally, after we have a cleaner method with tests, we can make the requested change. Emails should be masked like:
*****@example.com
Instead of the current behavior:
***************.com
To implement this change, we can use TDD (Test Driven Development) where we start with a failing test before the final code is written.
Since we already have a test for this case, let's change it to represent this new behavior:
it "masks an email address" do
expect(StringUtils::mask('example@example.com')).to eq('*******@example.com')
end
And, as expected, this test fails:
$ rspec
.F......
Failures:
1) StringUtils mask masks an email address
Failure/Error: expect(StringUtils::mask('example@example.com')).to eq('*******@example.com')
expected: "*******@example.com"
got: "***************.com"
(compared using ==)
# ./spec/string_utils_spec.rb:10:in `block (3 levels) in <top (required)>'
Finished in 0.08843 seconds (files took 0.2876 seconds to load)
8 examples, 1 failure
Failed examples:
rspec ./spec/string_utils_spec.rb:9 # StringUtils mask masks an email address
A valid solution could be changing the mask_email method to:
def self.mask_email(str)
email_sections = str.split("@")
email_username = email_sections.first
email_domain = email_sections.last
"#{'*' * email_username.length}@#{email_domain}"
end
And now our test passes:
$ rspec
........
Finished in 0.00385 seconds (files took 0.14405 seconds to load)
8 examples, 0 failures
Review of the process
We've gone through several different phases, let's review them together:
- We had a piece of code that was difficult to read. We knew it was working since it was being used in production, so we couldn't change its current behavior.
- We documented its behavior with characterization tests.
- After having tests to check that we aren't breaking anything, we refactored the code to make it cleaner.
- We modified one of the tests to represent the requested change so it failed.
- Finally, we modified the code with the requested change and the tests passed again.
This process helps solve the common challenge of needing to modify code that has no tests, and yet we know the code is working relatively well. I hope that with characterization tests you now have a new technique in your toolbox to make changes more safely in the future.