Caching is a general term which means storing the result of some code so that we can quickly retrieve it later. This allows us to, for example, perform some heavy number-crunching once and then just re-use the value without having to recalculate it again. Although the general concept is the same for all types of caching, there are various mechanisms we can use depending on what we are trying to cache.
For Rails developers the most common forms of caching are things like memoization (covered in a previous part of this caching series), view caching (stay tuned for the next article), and low-level caching, which we will cover here.
What is Low-Level Caching
What Rails calls low level caching is really just reading and writing data to a key-value store. Out of the box, Rails supports an in-memory store, files on the filesystem, and external stores like Redis or memcached. It is called "low level" caching because you are dealing with the Rails.cache
object directly, telling it what value to store and what key to use, this is in contrast to view caching where Rails has built-in helper methods to handle these nitty-gritty details for you.
The most common use-cases I encounter for low-level caching are read-only external API requests and heavy ActiveRecord computations. In the ActiveRecord case there are some alternatives to caching covered in the first part of this series that you may want to look into first, since introducing caching also increases the complexity and bug attack-surface of your application.
By default, Rails disables caching in development, because you usually want fresh data when you're working on a feature. You can easily toggle caching on and off using the rails dev:cache
command.
How it works
Rails provides three methods to deal with the cache: read
, write
, and fetch
. All of them take a cache "key" which is how we look up the value:
> Rails.cache.write("my-cache-key", 123)
> Rails.cache.read("my-cache-key")
=> 123
> Rails.cache.read("key-not-written")
=> nil
read
and write
are good to know about, but when implementing low level caching the fetch
method is what you'll probably use the most.
fetch
provides a nice wrapper around reading and writing. You pass it a key and a block, and if a value is present for that key in the cache it will be returned and the block is not executed. If there is no cached value for that key (or it has expired, more on expiration later) it will execute the block and store the result in the cache for next time.
def cached_result
Rails.cache.fetch(:cached_result) do
# Only executed if the cache does not already have a value for this key
puts "Crunching the numbers..."
12345
end
end
> cached_result
Crunching the numbers...
=> 123
> cached_result
=> 123
When to Use Low-Level Caching
A great use case for this kind of caching is when you are hitting an external API to get a value that may not change that often. In one client app we had some calculations based on the current futures price of some commodities. Rather than hit the API on every page refresh, we cache the value for a period of time (in our case 10 minutes).
class ExternalApiWrapper
...
def fetch_price
Rails.cache.fetch([self, :fetch_price], expires_in: 10.minutes) { read_api_price }
end
end
Keys and expiration
The value you pass to the cache method (read
, write
, or fetch
) is the "cache key", that is, the key
in the key-value pair stored in the cache. By the time it hits the cache store this will be a String
, but Rails allows us to pass in some other common objects too:
- A string with whatever content you like
- A symbol
- An object that responds to
cache_key_with_version
orcache_key
(such as an ActiveRecord model, we'll dig into these shortly) - An array with any combination of the above
A common technique I've used when adding low level caching to an ActiveRecord model is to pass an array containing self
(so the cached value is scoped to the current object) and the name of the method as a symbol, like:
class SomeModel < ApplicationRecord
def calculated_value
Rails.cache.fetch([self, :calculated_value]) do
...
end
end
end
To see what the actual generated cache key will look like you can call the ActiveSupport method directly:
> ActiveSupport::Cache.expand_cache_key([SomeModel.last, :test, :one, "two"])
=> "some_model/17-20200304104455464584/test/one/two"
The blob of numbers here is a combination of the model's id
and updated_at
timestamp. The id
part is so that this cached value is not overwritten by other instances of the model. The update_at
timestamp means that if the model is updated, the key automatically changes, saving us the hassle of manually invalidating the cached value.
Earlier I listed two methods for generating cache keys: cache_key
and cache_key_with_version
. ActiveRecord::Base
implements both. cache_key_with_version
takes precedence, which includes the update_at
timestamp as shown above. cache_key
, on the other hand, only returns the model name and id
:
> SomeModel.last.cache_key
=> "some_model/17"
> SomeModel.last.cache_key_with_version
=> "some_model/17-20200323114436755491"
In older versions of Rails, caching only allowed a cache_key
, which in ActiveRecord models would include the timestamp. The change to separate cache_key
and cache_key_with_version
was made in Rails 5.2 to allow for "recyclable cache keys".
The basic problem being solved is this one: Every time a model's updated_at
timestamp changes, its cache key changes. This is great for cache invalidation but means the cache is now storing old stale values that we'll never access again (because we'll never generate the old cache keys).
> widget = Widget.new
> old_key = widget.cache_key_with_version
=> "widgets/1-20200304104455464584"
> Rails.cache.fetch(old_key) { widget }
=> <Widget:0x00007fc2fe5da930, id: 1 ...
> widget.touch
> new_key = widget.cache_key_with_version
=> "widgets/1-20200323114436755491"
> Rails.cache.fetch(new_key) { widget }
=> <Widget:0x00007fc2fe5da930, id: 1 ...
> Rails.cache.read(old_key)
=> <Widget:0x00007fc2fe5da930, id: 1 ...
As you can see, the cache is now storing two copies of widget
, even though the old one will never be looked up again. Eventually, the cache will hit its memory limit and start dropping old values to free up space. In apps with a lot of cached data this could mean dropping values that we still want cached but are accessed less often.
Recyclable cache keys solve this problem by allowing us to explicitly pass the version to the cache method. The underlying key used in the cache will include just the ID, and the cache store will handle checking if the version we're giving it matches what is stored in the cache:
> old_version = Widget.last.cache_version
=> "20200320201134416105"
> Rails.cache.fetch(Widget.last, version: old_version) { "Test Value" }
=> "Test Value"
> Rails.cache.read("widgets/17")
=> "Test Value"
Rails.cache.fetch(Widget.last, version: Time.current) { "New Value" }
=> "New Value"
> Rails.cache.read("widgets/17")
=> "New Value"
Touching models
There are times when changes to one model require changes to a related model. Say you have Cart
and Product
models for an e-commerce store, and if the product is updated you need the carts to be updated. This is where you'd specify touch: true
on the relationship:
class Cart < ApplicationRecord
has_many :products
end
class Product < ApplicationRecord
belongs_to :cart, touch: true
end
This means any change to Product
will automatically change the updated_at
timestamps of all Carts
it "belongs to". This is true no matter what fields on Product
are being updated, so be mindful that this introduces some overhead, where what used to be a single database call to update product now also involves updating any number of related Carts
.
If needed, you can also call touch
on the model yourself to update the timestamp which can be very useful for manual cache-invalidation via the Rails console, or if you want finer-grained control about which particular rows are being updated.
Time-based Expiration
One of the options you can pass to the cache methods is when you want that key-value entry to be deleted. Personally, I often set this to a low number (or better yet, an environment variable) when deploying a new set of caching code, so that if things need tweaking you don't have to do much manual invalidation before testing again.
Rails.cache.fetch(Product.last, expires_in: 1.day) { ... }
You can also set a default expiration value on the cache store.
Example Use Cases
I'll be honest here, almost all the apps I've worked on have not needed to use this form of caching, with one important exception. We inherited an application that was, well, to say it was not well architected would be an understatement.
Even after a considerable amount of cleanup, we had two issues to deal with:
- A lot of calculations depended on the "current price" of commodities fetched from an API
- Various levels of nested aggregations like
child.map { |c| c.computed_field }.sum
, wherecomputed_field
itself contained anothermap{...}.sum
In an ideal world #2
would be resolved by boiling those calculations down and getting the database to do the number crunching. Consultancy work always requires balancing developer-cost against client-benefit though and this would require a non-trivial amount of hours to complete, so instead, we targeted the model methods that were causing the main performance issues and cached them.
This then tied into the solution for #1
as well; we added a scheduled job to update the price every 10 minutes. If the price has changed, the relevant models will be touch
ed, meaning their cached calculations will be invalidated.
As a simple example:
class UpdatePricesJob < ApplicationJob
def perform
Commodity.each { |commodity| commodity.update!(price: <fetch_api_price>) }
end
end
class Commodity < ApplicationRecord
belongs_to :invoice, touch: true
end
class Invoice < ApplicationRecord
has_many :commodities
def total_value
Rails.cache.fetch([self, :total_value]) { commodities.map(&:price).sum }
end
end
Gotchas
Because Rails' low-level caching is designed with ActiveRecord's updated_at
timestamp in mind, code that uses this can easily stray into one of two extremes:
- The cached value should change but the model's
update_at
did not changed (e.g. the model method being cached takes an argument), resulting in a cache invalidation bug. - Liberal use of
touch: true
on ActiveRecord associations solves the cache invalidation issues but starts to heavily tax the database instead.
An additional note on #2
is that adding a lot of touch
settings to objects can also dramatically increase database log entries. I have seen a production site go down simply because of this issue (i.e. the database server ran out of hard drive space, even though the actual DB load was normal).
When The View Is The Bottleneck
I've mostly talked about caching methods within an ActiveRecord
model here, as I believe that's the most common use-case for low-level caching in Rails. Rails.cache
can be called from anywhere in your Rails application though, so there's no reason it can't be used inside your business-logic classes as well.
You could even call it inside Rails views, but if you want to cache content for the view layer, Rails has support for that baked in, which is what we'll dive into in the next part of this series.