includes
is the recommended way to load associations of your records eagerly in Rails. In fact, the
Ruby on Rails guide for eager loading
only mentions includes
. However, there are other ways, and I want to argue that you should
avoid includes
.
Here's why:
includes
makes it easy to introduce an odd bugpreload
takes the same arguments asincludes
, but can't introduce the bug- When needed,
eager_load
also takes the same arguments, can introduce the bug, but makes it explicit
What bug? Let's show the introduction of an unexpected bug:
Say you have these records:
p = Post.create!
p.comments.create!(content: 'hello', spam: true)
p.comments.create!(content: 'world')
You want posts that have at least one comment marked as spam.
# distinct because joins can make duplicates
# I dislike using joins for this.
posts = Post.joins(:comments).distinct
.where(comments: {spam: true})
How many comments in this array?
posts.first.comments.to_a.size
2, nothing special there.
You will need to display the comments of the posts, so you includes
them.
How many comments are in the array?
posts.includes(:comments).first.comments.to_a
Only 1, the one with `spam = true`
What's the count
?
posts.includes(:comments).first.comments.count
What about size
?
posts.includes(:comments).first.comments.size
To be clear, the joins
isn't needed for this to happen.
It was just to show a progression.
Post.includes(:comments)
.where(comments: {spam: true})
.first.comments.size
# => 1
Personally, I find that unexpected.
Imagine if the post
is passed to a view or a helper, which uses the comments
. It
would only print the comments that matched the condition. Now if you try to debug from that helper,
you would see that post.comments
only has 1 comment.
Hopefully, you know that this is how includes
(and eager_load
, see below) behaves, so you may look
up where the post
comes from and figure it out. Good luck otherwise.
This is considered a feature. The consequences of doing conditions on eager loaded association are not in the guide, but they are in the middle of this section of the documentation.
I sometimes see this called "conditional eager loading", and the bug is doing it accidentally.
I consider the whole feature a maintenance burden. At least the guide doesn't recommend using it.
TL;DR: my recommendations
Things will get more technical in the next sections, so my condensed and straightforward recommendations are:
- For new code, if you are only doing eager loading, use
preload
instead ofincludes
. It does the same eager loading asincludes
, and takes the same arguments, but it ignores the conditions in the query. With it, things work as expected:
posts = Post.joins(:comments).distinct
.where(comments: {spam: true})
posts.preload(:comments).first.comments.to_a.size #=> 2
posts.preload(:comments).first.comments.count #=> 2
posts.preload(:comments).first.comments.size #=> 2
- If you need to order by an association, then
eager_load
is basically the only simple way to do so. If you need a condition (a
where
) which uses an association, avoidincludes
,joins
andeager_load
.
Instead, I recommend my gem: activerecord_where_assoc. Here's an introduction to it.
It's made for this purpose, and will support many more use cases, such as:- Recursive associations (parent/child)
- Polymorphic belongs_to
- Negative conditions (ex: posts without comments marked as spam)
- Multiple conditions on different records of the same association
Alternatively, there's another gem for this: where_exists
# Same as before, posts that have at least one comment marked as spam
Post.where_assoc_exists(:comments, spam: true)
If
includes
seem to work somewhere thatpreload
doesn't, you're probably doing a condition on an association or ordering by an association. See the previous points for this.For existing code, you can't mindlessly change all
includes
topreload
, because some of it may rely onincludes
adding aJOIN
to the query (theeager_load
way), which happens when the query refers to the table of the included associations. So while it would be better to change everything topreload
and sometimeseager_load
, every such change must be tested.If you see an
includes
with areferences
, then that's just a call toeager_load
. At this point, just useeager_load
to make your code shorter.
So don't risk includes
doing the wrong thing. preload
means simple eager loading without the booby trap;
you should use it. Treat eager_load
as a warning sign that this could be doing conditional eager loading and
be careful around it.
Down the rabbit hole
If you want to understand why I make those recommendations, we'll have to get technical...
Eager loading means loading associations of multiple records before they are needed. This is done to reduce the number of queries executed, making execution faster.
There are actually 3 methods for eager loading in Rails:
preload
: Executes one extra query per association being eager loaded. Same asincludes
usually does.eager_load
: AddsJOIN
to the SQL query and load the association without doing an extra query. This also enables adding conditions on the table, which is the cause of the conditional eager loading bug from the introduction.includes
: Picks betweenpreload
andeager_load
based on if there is a reference, in the query, to the table of an association that was passed toincludes
. This can be fromwhere
or fromjoins
.
You may also specify an association withreferences
to force theeager_load
path, which is needed when your conditions are specified with aString
instead of aHash
(which, again, causes conditional eager loading).
So out of the 3 methods, only one of them cannot trigger conditional eager loading: preload
. It only does
full eager loading, always the same way.
When is eager_load
needed?
The main reason to use eager_load
, that I have no alternative for, is ordering by an association's field.
# Ordering posts by created_at of last comment
Post.eager_load(:comments).order("comments.created_at DESC")
Maybe some use it to reduce the number of queries when they do eager loading. I don't think it really saves much, and there is a risk of slowing things down by making queries that are heavier.
Some may use it to actually do conditional eager loading. I still heavily disagree with that use case.
I've had to edit code that used this "feature" once...
You look at a method and it looks wrong; it can't be doing what it should be doing. It's using every
project.users
, not just those we want! When I did an interactive console there (binding.pry
or byebug
), I saw that users were missing from project.users
.
Since I knew of this "feature", I started looking and, as expected, a condition on an includes
was found...
3 method calls away from where the association was used, not a single comment to explain what is going on anywhere.
You should avoid code that looks wrong. Code that uses conditional eager loading looks wrong. In our case the overall module was already something that we wanted to rewrite from scratch, so this was just another reason to do so.
Other than ordering, I mostly see eager_load
used to do a condition (a where
) which uses an association. Let's dig into these.
where
on an association with eager_load
It's a somewhat frequent need and there are many questions about this on stack overflow.
The bug from the introduction, accidentally doing conditional eager loading, started with such a need: "I want the posts that have comments marked as spam".
You may see a recommendation to use includes
, and then have a condition on its table. This actually
uses the eager_load
path.
It looks like this:
# Please stop doing this :(
Post.includes(:comments).where(comments: {spam: true})
# which is equivalent to this; don't do this either
Post.eager_load(:comments).where(comments: {spam: true})
Again, this does conditional eager loading, which isn't what we asked for.
To be clear, the where
on an association with includes
/ eager_load
can be safe. But only if
the association is a belongs_to
. When it is, there are only 2 possibilities: either load the
record and the associated belongs_to
records, or don't load either. No conditional eager loading is possible.
But even when it's safe, there are risks:
- Using
includes
/eager_load
increases the chance for a mistake, where you or someone else just add another association to the existing eager loading call. - Every time a reader sees
includes
/eager_load
, he may wonder if it is safe, or if there could be accidental conditional eager loading.
And as a tool, this isn't so great:
- If you don't need the associated records, then eager loading them is wasteful.
- Doesn't handle recursive associations (ex: parent/children)
- Doesn't compose well
- Looks potentially wrong when you know of the conditional eager loading "feature"
where
on an association with joins
The next option is to use joins
. It also has downsides.
It looks like this:
# Please stop doing this :(
Post.joins(:comments).where(comments: {spam: true})
# and stop doing this
Post.joins(:comments).distinct.where(comments: {spam: true})
Using joins
like this is better than includes
/ eager_load
since at least, there is no risk of conditionally
loading an association. But there are still problems with it:
- Doesn't handle recursive associations (ex: parent/children)
- Requires a
distinct
to avoid duplicated records when used withhas_many
associations.
This can be unexpected if you're doing a more complex query than j ust fetching records. - Doesn't compose well
where
on an association with Arel
Truth is, this need for a where
on an association isn't something that ActiveRecord supports well. So
leaving the ActiveRecord only solutions, you can do an actual EXISTS
query with Arel. EXISTS
is
the SQL tool that is meant to do this type of condition, not JOIN
.
It looks like this:
# An OK way, but error prone
Post.where(Comment.where("posts.id = comments.post_id").where(spam: true).arel.exists)
This composes much better with other tools because all it does is add a single WHERE
clause to the query.
It works as you would expect with or
, not
and with other conditions on the same association.
But there still are new downsides:
- You must manually write condition to link the
posts
to thecomments
. It's easy to forget it, and I've seen StackOverflow answers that forgot to do so.
You won't get any error for forgetting, your query will just be wrong, which may not even be obvious if all you have is a little test data. Bonus: This can get extra tedious for polymorphic associations, where you also need to this check:foos.owner_type = #{Bar.base_class.name}
. - If a condition was given when defining the association, you must also manually rewrite it.
- Only the models are named in the code, not the association of interest. This makes the intent less clear, especially when non-trivial associations exist.
- Extra work to handle recursive associations (ex: parent/children)
- Quite a bit longer to write, and this is a short example.
Other than writing the whole condition manually, which would have all the problems of the Arel way, but be more verbose and more error-prone, I think we're out of built-in ways.
where
on an association with activerecord_where_assoc
What I recommend for conditions based on associations is a gem I made just for this purpose: activerecord_where_assoc. It looks like this:
# Please consider doing this:
Post.where_assoc_exists(:comments, spam: true)
# Or using a scope such as is_spam:
Post.where_assoc_exists(:comments) { is_spam }
The query it generates is the same as the Arel example, with the same benefits and more. See for yourself:
- It just adds a single
where
condition, so it composes well and works withor
and with other conditions on the same association. - Handles recursive associations automatically (ex: parent/children)
- Handles polymorphic belongs_to (
includes
andjoins
would simply refuse) - Easy to do a
NOT
of the condition (I.E.: where no comment is marked as spam) withwhere_assoc_not_exists
. - Composes with other such queries, even on the same association, even with negations
- Unlike Arel, this uses the association's name, so the intent is clearer.
So if you need to do this kind of condition, here are some references for my gem:
- Introduction to activerecord_where_assoc.
- The problems of the other ways of doing such conditions.
- Multiple example usages.
There's simply no way I could find to use builtin tools to have this query be clear, succinct and not booby trapped. Either live with the booby traps, write your own methods to do this cleanly, or use one of the gems written for this purpose:
Seriously, try any of them, it's liberating how simple this once complex task becomes.
But includes
is everywhere
It is! Let's explore the reasons I can think of.
includes
is the "smart" function out of the 3, it will pick the "right" strategy when needed.
Marketing-wise, this sounds like a good thing... Until you learn that the alternate path, eager_load
, is
not always what you want and it can cause bugs due to conditional eager loading.
For a long time, includes
(and eager_load
) were the only way to do a LEFT JOIN
The method left_joins
was added in Rails 5.0. Before that, if you wanted one, you had to either do
includes
/ eager_load
, or write the whole "LEFT JOIN" yourself like this: joins("LEFT JOIN
comments ON comments.post_id = posts.id")
. The includes
shortcut was often suggested.
includes
has always been recommended, so most are familiar with it, and most recommend it.
Everything is against preload
, even it's documentation makes preload
sound like an alias for includes
, and the Rails guide only mentions includes
for eager loading data.
I think not enough people were both harmed by includes
and aware that you can just specify preload
and eager_load
for that knowledge to spread.
Recap
Conditional eager loading:
includes
andeager_load
can accidentally eager load only part of an association, a good source of bugs.- Doing conditional eager loading voluntarily can be maintenance burden
- If you do want conditional eager loading, using
eager_load
makes it a bit more obvious.
Conditions based on associations:
- Using
includes
andeager_load
for conditions based on associations can do conditional eager loading at the same time, you will get bitten by the bugs it can causes. - if you don't need to load the association, eager loading it is wasteful
- Using specialized gems to do conditions based on association is safer, clearer and easier.
Order based on association:
includes
andeager_load
are the only simple way.- Using
eager_load
is explicit about the use case, and you don't need to also callreferences
.
Regular old eager loading:
- Just use
preload
If you want to run the examples from this post, here is a self-contained ruby script.