Today I wanted to talk about one of my favorite techniques for improving performance. It's a source of easy little performance wins that eventually add up, and only occasionally reduce your application to a heap of smoldering rubble. Only very occasionally.
This technique is called "memoization." Despite the $10 computer-science word, it simply means that, instead of doing the same work every time you call a method, you save the return value to a variable and use that instead.
Here's what it looks like in pseudocode:
def my_method
@memo = <work> if @memo is undefined
return @memo
end
And here's how you do it in Ruby. This is the most robust approach, but it's quite verbose. There are other, more concise approaches that we'll discuss later.
class MyClass
def my_method
unless defined?(@my_method)
@my_method = begin
# Do your calculation, database query
# or other long-running thing here.
end
end
@my_method
end
end
The code above does three things:
- Checks to see if there is an instance variable named
@my_method
. - If there is, it does some work and saves the result in
@my_method
. - It returns
@my_method
Don't be confused by the fact that we have both a method and an instance variable named my_method
. I could have named my variable anything, but it's convention to name it after the method that is being memoized.
A shorthand version
One problem with the code above is that it's a bit cumbersome. Because of this, you're much more likely to see a shorthand version that does almost the same thing:
class MyClass
def my_method1
@my_method1 ||= some_long_calculation
end
def my_method2
@my_method2 ||= begin
# The begin-end block lets you easily
# use multiple lines of code here.
end
end
end
Both of these use Ruby's a ||= b
operator, which is shorthand for a || (a = b)
, which is itself more-or-less shorthand for:
# You wouldn't use return like this in real life.
# I'm just using it to express to beginners the idea
# that the conditional evaluates to whatever winds up in `a`.
if a
return a
else
a = b
return a
end
A source of bugs
If you're paying very close attention you may have noticed that the shorthand versions evaluate the "truthiness" of the memo variable instead of checking for the existence of it. This is the source of one of the major limitations of the shorthand version: it doesn't ever memoize nil
or false
.
There are lots of use cases where this isn't important. But it's one of those annoying facts that you have to keep in the back of your mind whenever you're memoizing.
Memoizing methods with arguments
So far we've only dealt with memoizing single values. But not many functions return the same result all the time. Let's take a look at an old favorite of technical interviews: the Fibonacci sequence.
You can calculate the Fibonacci sequence recursively in Ruby like so:
class Fibonacci
def self.calculate(n)
return n if n == 0 || n == 1
calculate(n - 1) + calculate(n - 2)
end
end
Fibonacci.calculate(10) # => 55
The problem with this implementation is that it's inefficient. To prove this, let's add a print
statement to view the value of n
.
class Fibonacci
def self.calculate(n)
print "#{ n } "
return n if n == 0 || n == 1
calculate(n - 1) + calculate(n - 2)
end
end
Fibonacci.calculate(4)
# Outputs: 4 3 2 1 0 1 2 1 0
As you can see, calculate
is being called repeatedly with many of the same values of n
. In fact the number of calls to calculate
is going to grow exponentially with n
.
One way around this is to memoize the results of calculate
. Doing so isn't very different from the other memoization examples we've covered.
class Fibonacci
def self.calculate(n)
@calculate ||= {}
@calculate[n] ||= begin
print "#{ n } "
if n == 0 || n == 1
n
else
calculate(n - 1) + calculate(n - 2)
end
end
end
end
Fibonacci.calculate(4)
# Outputs: 4 3 2 1 0
Now that we've memoized calculate
, the number of calls no longer increases exponentially with n
.
Fibonacci.calculate(20)
# Outputs: 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Invalidation
Memoization is a lot like caching, except that memoized results don't automatically expire, and there's usually not an easy way to clear them once set.
For use-cases like the Fibonacci sequence generator this hardly matters. Fibonacci.calculate(10)
will always return the same result. But in other use cases it does matter.
For example, you might see code like this:
# Not the best idea
class User
def full_name
@full_name ||= [first_name, last_name].join(" ")
end
end
Personally, I wouldn't use memoization here because if the first or last name is changed, the full name might not be updated.
The one place where you can be a little more lax is inside of Rails controllers. It's very common to see code like this:
class ApplicationController
def current_user
@current_user ||= User.find(...)
end
end
This is ok, because the controller instance is destroyed after each web request. It's unlikely that the currently logged-in user would change during any normal request.
When dealing with streaming connections like ActionCable you may need to be more careful. I don't know. I've never used it.
Overuse
Finally I feel like I should point out that like anything, it's possible to take memoization too far. It's a technique that really should only be applied to expensive operations that will never change throughout the lifetime of the memo variable.