I remember the first time I saw rails' ActiveRecord. It was a revelation. This was back in 2005 and I was hand-coding SQL queries for a PHP app. Suddenly, using the database went from being a tedious chore, to being easy and - dare I say - fun.
...then I started to notice the performance issues.
ActiveRecord itself wasn't slow. I'd just stopped paying attention to the queries that were actually being run. And it turns out, some of the most idiomatic database queries used in Rails CRUD apps are by-default quite poor at scaling up to larger datasets.
In this article we're going to discuss three of the biggest culprits. But first, let's talk about how you can tell if your DB queries are going to scale well.
Measuring Performance
Every DB query is performant if you have a small enough dataset. So to really get a feel for performance, we need to benchmark against a production-sized database. In our examples, we're going to use a table called faults
with around 22,000 records.
We're using postgres. In postgres, the way you measure performance is to use explain
. For example:
# explain (analyze) select * from faults where id = 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Index Scan using faults_pkey on faults (cost=0.29..8.30 rows=1 width=1855) (actual time=0.556..0.556 rows=0 loops=1)
Index Cond: (id = 1)
Total runtime: 0.626 ms
This shows both the estimated cost to perform the query (cost=0.29..8.30 rows=1 width=1855)
and the actual time it took to perform it (actual time=0.556..0.556 rows=0 loops=1)
If you prefer a more readable format, you can ask postgres to print the results in YAML.
# explain (analyze, format yaml) select * from faults where id = 1;
QUERY PLAN
--------------------------------------
- Plan: +
Node Type: "Index Scan" +
Scan Direction: "Forward" +
Index Name: "faults_pkey" +
Relation Name: "faults" +
Alias: "faults" +
Startup Cost: 0.29 +
Total Cost: 8.30 +
Plan Rows: 1 +
Plan Width: 1855 +
Actual Startup Time: 0.008 +
Actual Total Time: 0.008 +
Actual Rows: 0 +
Actual Loops: 1 +
Index Cond: "(id = 1)" +
Rows Removed by Index Recheck: 0+
Triggers: +
Total Runtime: 0.036
(1 row)
For now we're only going to focus on "Plan Rows" and "Actual Rows."
- Plan Rows In the worst case, how many rows will the DB have to loop through to respond to your query
- Actual Rows When it executed the query, how many rows did the DB actually loop through?
If "Plan Rows" is 1, like it is above, then the query is probably going to scale well. If "Plan Rows" is equal to the number of rows in the database, that means the query is going to do a "full table scan" and is not going to scale well.
Now that you know how to measure query performance, let's look at some common rails idioms and see how they stack up.
Counting
It's really common to see code like this in Rails views:
Total Faults <%= Fault.count %>
That results in SQL that looks something like this:
select count(*) from faults;
Let's plug the in to explain
and see what happens.
# explain (analyze, format yaml) select count(*) from faults;
QUERY PLAN
--------------------------------------
- Plan: +
Node Type: "Aggregate" +
Strategy: "Plain" +
Startup Cost: 1840.31 +
Total Cost: 1840.32 +
Plan Rows: 1 +
Plan Width: 0 +
Actual Startup Time: 24.477 +
Actual Total Time: 24.477 +
Actual Rows: 1 +
Actual Loops: 1 +
Plans: +
- Node Type: "Seq Scan" +
Parent Relationship: "Outer"+
Relation Name: "faults" +
Alias: "faults" +
Startup Cost: 0.00 +
Total Cost: 1784.65 +
Plan Rows: 22265 +
Plan Width: 0 +
Actual Startup Time: 0.311 +
Actual Total Time: 22.839 +
Actual Rows: 22265 +
Actual Loops: 1 +
Triggers: +
Total Runtime: 24.555
(1 row)
Woah! Our simple count query is looping over 22,265 rows — the entire table! In postgres, counts always loop over the entire record set.
You can decrease the size of the record set by adding where
conditions to the query. Depending on your requirements, you may get the size low enough where performance is acceptable.
The only other way around this issue is to cache your count values. Rails can do this for you if you set it up:
belongs_to :project, :counter_cache => true
Another alternative is available when checking to see if the query returns any records. Instead of Users.count > 0
, try Users.exists?
. The resulting query is much more performant. (Thanks to reader Gerry Shaw for pointing this one out to me.)
Sorting
The index page. Almost every app has at least one. You pull the newest 20 records from the database and display them. What could be simpler?
The code to load the records might look a little like this:
@faults = Fault.order(created_at: :desc)
The sql for that looks like this:
select * from faults order by created_at desc;
So let's analyze it:
# explain (analyze, format yaml) select * from faults order by created_at desc;
QUERY PLAN
--------------------------------------
- Plan: +
Node Type: "Sort" +
Startup Cost: 39162.46 +
Total Cost: 39218.12 +
Plan Rows: 22265 +
Plan Width: 1855 +
Actual Startup Time: 75.928 +
Actual Total Time: 86.460 +
Actual Rows: 22265 +
Actual Loops: 1 +
Sort Key: +
- "created_at" +
Sort Method: "external merge" +
Sort Space Used: 10752 +
Sort Space Type: "Disk" +
Plans: +
- Node Type: "Seq Scan" +
Parent Relationship: "Outer"+
Relation Name: "faults" +
Alias: "faults" +
Startup Cost: 0.00 +
Total Cost: 1784.65 +
Plan Rows: 22265 +
Plan Width: 1855 +
Actual Startup Time: 0.004 +
Actual Total Time: 4.653 +
Actual Rows: 22265 +
Actual Loops: 1 +
Triggers: +
Total Runtime: 102.288
(1 row)
Here we see that the DB is sorting all 22,265 rows every single time you do this query. No bueno!
By default, every "order by" clause in your SQL causes the record set to be sorted right then, in real-time. There's no caching. No magic to save you.
The solution is to use indexes. For simple cases like this one, adding a sorted index to the created_at column will speed up the query quite a bit.
In your Rails migration you could put:
class AddIndexToFaultCreatedAt < ActiveRecord::Migration
def change
add_index(:faults, :created_at)
end
end
Which runs the following SQL:
CREATE INDEX index_faults_on_created_at ON faults USING btree (created_at);
There at the very end, (created_at)
specifies a sort order. By default it's ascending.
Now, if we re-run our sort query, we see that it no longer includes a sorting step. It simply reads the pre-sorted data from the index.
# explain (analyze, format yaml) select * from faults order by created_at desc;
QUERY PLAN
----------------------------------------------
- Plan: +
Node Type: "Index Scan" +
Scan Direction: "Backward" +
Index Name: "index_faults_on_created_at"+
Relation Name: "faults" +
Alias: "faults" +
Startup Cost: 0.29 +
Total Cost: 5288.04 +
Plan Rows: 22265 +
Plan Width: 1855 +
Actual Startup Time: 0.023 +
Actual Total Time: 8.778 +
Actual Rows: 22265 +
Actual Loops: 1 +
Triggers: +
Total Runtime: 10.080
(1 row)
If you're sorting by multiple columns, you'll need to create an index that is sorted by multiple columns as well. Here's what that looks like in a Rails migration:
add_index(:faults, [:priority, :created_at], order: {priority: :asc, created_at: :desc)
As you start doing more complex queries, it's a good idea to run them through explain
. Do it early and often. You may find that some simple change to the query has made it impossible for postgres to use the index for sorting.
Limits and Offsets
In our index pages we hardly ever include every item in the database. Instead we paginate, showing only 10 or 30 or 50 items at a time. The most common way to do this is by using limit
and offset
together. In Rails it looks like this:
Fault.limit(10).offset(100)
That produces SQL which looks like this:
select * from faults limit 10 offset 100;
Now if we run explain, we see something odd. The number of rows scanned is 110, equal to the limit plus the offset.
# explain (analyze, format yaml) select * from faults limit 10 offset 100;
QUERY PLAN
--------------------------------------
- Plan: +
Node Type: "Limit" +
...
Plans: +
- Node Type: "Seq Scan" +
Actual Rows: 110 +
...
If you change the offset to 10,000 you'll see that the number of rows scanned jumps to 10010, and the query is 64x slower.
# explain (analyze, format yaml) select * from faults limit 10 offset 10000;
QUERY PLAN
--------------------------------------
- Plan: +
Node Type: "Limit" +
...
Plans: +
- Node Type: "Seq Scan" +
Actual Rows: 10010 +
...
This leads to a disturbing conclusion: when paginating, later pages are slower to load than earlier pages. If we assume 100 items per page in the example above, page 100 is 13x slower than page 1.
So what do you do?
Frankly, I haven't been able to find a perfect solution. I'd start by seeing if I could reduce the size of the dataset so I didn't have to have 100s or 1000s of pages to begin with.
If you're unable to reduce your record set, your best bet may be to replace offset/limit with where clauses.
# You could use a date range
Fault.where("created_at > ? and created_at < ?", 100.days.ago, 101.days.ago)
# ...or even an id range
Fault.where("id > ? and id < ?", 100, 200)
Conclusion
I hope this article has convinced you that you should really be taking advantage of postgres' explain function to find possible performance issues with your db queries. Even the simplest queries can cause major performance issues, so it pays to check. :)