A Visual Guide to Using :includes in Rails

If you're new to developing Rails applications, you've probably come across the term N + 1 queries. You probably also know that this is something you want to avoid.

As an engineer who joined Gusto straight out of undergrad with no Rails or full-time industry experience, I was initially overwhelmed by this concept. But, when broken down into bite size chunks, the solution to fixing N + 1 queries is actually quite simple.

N + 1 queries occur when a group of records are loaded from the database in an inefficient way, along with any records associated with them. The examples below dive into how we can solve this issue with :includes and will help demystify how this method works under the hood.

Outlining the Problem

Let’s say we have a model called Employee who has many Forms.


# == Schema Information
# Table name: employees
# id
# name
class Employee < ApplicationRecord
  has_many: :forms

# == Schema Information
# Table name: forms
# id
# employee_id
# kind
class Form < ApplicationRecord
  belongs_to: :employee

And we have 5 total employee records and forms that we want to load because we want to do some sort of mutation.


Employee.map { |employee| employee.forms }.flatten

The SQL for this command looks like:

> SELECT `employees`.* FROM `employees` ORDER BY `employees`.`id`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 1
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 2
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 3
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 4
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` = 5

There are 6 total hits to the database being made since we load employees in the first query and then make 5 additional queries to grab each employee's forms. In other words, N + 1 SQL selects occur where N = 5.


Making queries ⚡️ with :includes

Rails provides an ActiveRecord method called :includes which loads associated records in advance and limits the number of SQL queries made to the database. This technique is known as "eager loading" and in many cases will improve performance by a significant amount.

Depending on what your query is, :includes will use either the ActiveRecord method :preload or :eager_load.

When does :includes use :preload?

In most cases :includes will default to use the method :preload which will fire 2 queries:

  1. Load all records tied to the leading model
  2. Load records associated with the leading model based off the foreign key on the associated model or the leading model

So, if we introduced :preload to our query we would produce only 2 SQL selects where forms would be loaded, based off the foreign key Form#employee_id.

Employee.preload(:forms).map { |employee| employee.forms }.flatten

> SELECT `employees`.* FROM `employees`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` IN (1, 2, 3, 4, 5)


The SQL for this example would look the exact same if we were to replace :preload with :includes.

Employee.includes(:forms).map { |employee| employee.forms }.flatten

> SELECT `employees`.* FROM `employees`
> SELECT `forms`.* FROM `forms` WHERE `forms`.`employee_id` IN (1, 2, 3, 4, 5)

When does :includes use :eager_load?

:includes will default to use :preload unless you reference the association being loaded in a subsequent clause, such as :where or :order.

Employee.includes(:forms).where('forms.kind = "health"')

In this case, :includes will use the method :eager_load which will produce 1 query that uses left outer joins to build an intermediary table which is then used to construct the model output.

> SELECT `employees`.`id` AS t0_r0, `employees`.`name` AS t0_r1, `forms`.`id` AS t1_r0, `forms`.`employee_id` AS t1_r1, `forms`.`kind` AS t1_r2 LEFT OUTER JOIN `forms` ON `forms`.`employee_id` = `employees`.`id` WHERE (forms.kind = "health")


The SQL for this example would look the exact same if we were to replace :eager_load with :includes.

Employee.eager_load(:forms).where('forms.kind = "health"')

> SELECT `employees`.`id` AS t0_r0, `employees`.`name` AS t0_r1, `forms`.`id` AS t1_r0, `forms`.`employee_id` AS t1_r1, `forms`.`kind` AS t1_r2 LEFT OUTER JOIN `forms` ON `forms`.`employee_id` = `employees`.`id` WHERE (forms.kind = "health")

If you replaced :preload with :includes, however, the query would fail to execute.

Can I use :includes with Form as the leading model instead?

If the query were reversed, where forms were loaded first and we wanted to efficiently load the employee association, we can still use :includes. The query will load employees based off the collection of Employee#ids, which is referenced from Form#employee_id.


> SELECT `forms`.* FROM `forms`
> SELECT `employees`.* FROM `employees` WHERE `employees`.`id` IN (1, 2, 3, 4, 5)


Can I eager load nested associations with :includes?

Yes. Here are a few examples of different scenarios you could use if the Form model was modified to have additional associations such as has_one: :signer and has_one: issuer, which has_one: :address.

Employee.includes(forms: :issuer)

Employee.includes(forms: { issuer: :address })

Employee.includes(forms: [{ issuer: :address }, :signer])

Benchmarking—Is using :includes always faster?

How big is the performance gain by using :includes in your code? And since :includes invokes either :preload or :eager_load, which method ends up being faster?

I tested the performance of the same queries listed in the examples above in a local instance of Gusto's database to see the difference. Each query was run 3 times with caching disabled and then averaged in the table below.

Employee.first(n).map { |employee| employee.forms }.flatten

Employee.preload(:forms).first(n).map { |employee| employee.forms }.flatten

Employee.eager_load(:forms).first(n).map { |employee| employee.forms }.flatten


The data shows that using :includes will improve the performance significantly when a :preload is invoked, but has the opposite effect for :eager_load in most cases. :eager_load constructs a complex query to gather information about the associations being loaded, thus it makes sense that this option would be slower (although even I was a bit shocked at the dramatic decrease).

Measuring query performance in your own app is the best way to ensure you are using the most efficient methods available for your use cases. As a general rule, though, getting in the habit of using :includes will lead to performant code that creates the best experience for other developers in your codebase - and of course, customers interacting with your apps.

Thank you to Quan Nguyen, Tori Huang, Jim Liu, and the Gusto Engineering Blog team for editing and feedback on earlier drafts.