The Rubyist's Guide to Memoization

Today I wanted to talk about one of my favorite techniques for improving performance. It's a source of easy little performance wins that eventually add up, and only occasionally reduce your application to a heap of smoldering rubble. Only very occasionally.

This technique is called "memoization." Despite the $10 computer-science word, it simply means that, instead of doing the same work every time you call a method, you save the return value to a variable and use that instead.

Here's what it looks like in pseudocode:

def my_method
  @memo = <work> if @memo is undefined
  return @memo
end

And here's how you do it in Ruby. This is the most robust approach, but it's quite verbose. There are other, more concise approaches that we'll discuss later.

class MyClass
  def my_method
    unless defined?(@my_method)
      @my_method = begin 
         # Do your calculation, database query
         # or other long-running thing here. 
      end
    end
    @my_method
  end
end

The code above does three things:

Checks to see if there is an instance variable named @my_method.
If there is, it does some work and saves the result in @my_method.
It returns @my_method

Don't be confused by the fact that we have both a method and an instance variable named my_method. I could have named my variable anything, but it's convention to name it after the method that is being memoized.

A shorthand version

One problem with the code above is that it's a bit cumbersome. Because of this, you're much more likely to see a shorthand version that does almost the same thing:

class MyClass
  def my_method1
    @my_method1 ||= some_long_calculation
  end

  def my_method2
    @my_method2 ||= begin
      # The begin-end block lets you easily 
      # use multiple lines of code here. 
    end
  end
end

Both of these use Ruby's a ||= b operator, which is shorthand for a || (a = b), which is itself more-or-less shorthand for:

# You wouldn't use return like this in real life. 
# I'm just using it to express to beginners the idea
# that the conditional evaluates to whatever winds up in `a`.
if a
  return a
else
  a = b
  return a
end

A source of bugs

If you're paying very close attention you may have noticed that the shorthand versions evaluate the "truthiness" of the memo variable instead of checking for the existence of it. This is the source of one of the major limitations of the shorthand version: it doesn't ever memoize nil or false.

There are lots of use cases where this isn't important. But it's one of those annoying facts that you have to keep in the back of your mind whenever you're memoizing.

Memoizing methods with arguments

So far we've only dealt with memoizing single values. But not many functions return the same result all the time. Let's take a look at an old favorite of technical interviews: the Fibonacci sequence.

You can calculate the Fibonacci sequence recursively in Ruby like so:

class Fibonacci
  def self.calculate(n)
    return n if n == 0 || n == 1
    calculate(n - 1) + calculate(n - 2)
  end
end

Fibonacci.calculate(10) # => 55

The problem with this implementation is that it's inefficient. To prove this, let's add a print statement to view the value of n.

class Fibonacci
  def self.calculate(n)
    print "#{ n } "
    return n if n == 0 || n == 1
    calculate(n - 1) + calculate(n - 2)
  end
end

Fibonacci.calculate(4)

# Outputs: 4 3 2 1 0 1 2 1 0

As you can see, calculate is being called repeatedly with many of the same values of n. In fact the number of calls to calculate is going to grow exponentially with n.

One way around this is to memoize the results of calculate. Doing so isn't very different from the other memoization examples we've covered.

class Fibonacci
  def self.calculate(n)
    @calculate ||= {}
    @calculate[n] ||= begin
      print "#{ n } "
      if n == 0 || n == 1
        n
      else
        calculate(n - 1) + calculate(n - 2)
      end
    end
  end
end

Fibonacci.calculate(4)

# Outputs: 4 3 2 1 0

Now that we've memoized calculate, the number of calls no longer increases exponentially with n.

Fibonacci.calculate(20)

# Outputs: 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Invalidation

Memoization is a lot like caching, except that memoized results don't automatically expire, and there's usually not an easy way to clear them once set.

For use-cases like the Fibonacci sequence generator this hardly matters. Fibonacci.calculate(10) will always return the same result. But in other use cases it does matter.

For example, you might see code like this:

# Not the best idea
class User
  def full_name
    @full_name ||= [first_name, last_name].join(" ")
  end
end

Personally, I wouldn't use memoization here because if the first or last name is changed, the full name might not be updated.

The one place where you can be a little more lax is inside of Rails controllers. It's very common to see code like this:

class ApplicationController
  def current_user
    @current_user ||= User.find(...)
  end
end

This is ok, because the controller instance is destroyed after each web request. It's unlikely that the currently logged-in user would change during any normal request.

When dealing with streaming connections like ActionCable you may need to be more careful. I don't know. I've never used it.

Overuse

Finally I feel like I should point out that like anything, it's possible to take memoization too far. It's a technique that really should only be applied to expensive operations that will never change throughout the lifetime of the memo variable.