We use partitioned tables in our primary PostgreSQL database to efficiently expire old data, as deleting a bunch of data from a huge table can cause database performance to suffer. Prior to version 10, PostgreSQL didn't have native support for partitioned tables, so we used the pg_partman extension to implement partitioning. It works by using PostgreSQL's table inheritance to create child tables of the table that is to be partitioned and triggers to insert data into the child tables rather than the parent table. That extension has worked well for us, but it has a downside -- it isn't an option when you are using Amazon RDS, as it isn't supported. Now that that PostgreSQL has support for native partitions, I figured now was the time to see about dropping that extension so we'd have the option of using RDS.

Our partitioning use-case is fairly simple: we partition tables based on time, creating a new partition for every day, week, or month, depending on how many rows we want to store across all partitions and on how long we want to retain the data. All of our partitioned tables have a created_at column that will be used to determine which partition stores each row. For example, we might have a table defined like this:

create table events (
  project_id integer,
  data jsonb,
  created_at timestamp
)
partition by range (created_at);

And if we wanted to have weekly partitions, they would look like this:

create table events_p2019_10_28 partition of events for values from ('2019-10-28') to ('2019-11-04');
create table events_p2019_11_04 partition of events for values from ('2019-11-04') to ('2019-11-11');

With a time-based partitioning scheme, deleting old data is as simple as dropping one of the partitions. The regular maintenance we need to do, then, is create new partitions for date ranges as we approach them and delete old partitions containing data we no longer want. To make that maintenance a little easier, I have created the pg_partition_manager gem. Naturally, it's inspired by my experience with the pg_partman extension, which has served us so well.

Let's take a look at how you'd use this gem, given the events table and partitioning scheme described above. You would create a script or rake task that looks like this:

require "pg_partition_manager"

PgPartitionManager::Time.process([{parent_table: "public.events", period: "week", premake: 1, retain: 3}])

The parent_table is defined as schema.table_name (public, the default schema, is often the only one Rails developers end up using). The period can be day, week, or month. You can choose how many tables you want created in advance (after the current period) with premake, and how many tables you want to keep (prior to the current period) with retain. The gem defaults to pre-creating 4 tables if you don't specify premake, and it defaults to retaining data for 7 days, 4 weeks, and 6 months if you don't specify retain.

Invoke that script/task with a daily cron job, and you're all set -- it will create and drop the tables as needed.

All ActiveRecord queries act just as they would with a non-partitioned table, so there's nothing you need to change in your code. That is, Event.create, Event.where, etc., will work just as they always have, with PostgreSQL putting the data in the right partition for you when you insert it. There is one change you may notice if you have lots of data, though... when you include created_at in your queries, PostgreSQL won't have to scan all the partitions -- just the ones the cover the range you specify in your where clause.

To recap, if you have a lot of time-based data that you want to delete as it expires, use PostgreSQL partitioned tables and the pg_partition_manager gem to make your app happy. :)

Get the Honeybadger newsletter

Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo
    Benjamin Curtis

    Ben has been developing web apps and building startups since '99, and fell in love with Ruby and Rails in 2005. Before co-founding Honeybadger, he launched a couple of his own startups: Catch the Best, to help companies manage the hiring process, and RailsKits, to help Rails developers get a jump start on their projects. Ben's role at Honeybadger ranges from bare-metal to front-end... he keeps the server lights blinking happily, builds a lot of the back-end Rails code, and dips his toes into the front-end code from time to time. When he's not working, Ben likes to hang out with his wife and kids, ride his road bike, and of course hack on open source projects. :)

    More articles by Benjamin Curtis
    An advertisement for Honeybadger that reads 'Turn your logs into events.'

    "Splunk-like querying without having to sell my kidneys? nice"

    That’s a direct quote from someone who just saw Honeybadger Insights. It’s a bit like Papertrail or DataDog—but with just the good parts and a reasonable price tag.

    Best of all, Insights logging is available on our free tier as part of a comprehensive monitoring suite including error tracking, uptime monitoring, status pages, and more.

    Start logging for FREE
    Simple 5-minute setup — No credit card required