You should avoid `includes` in Rails

includes is the recommended way to load associations of your records eagerly. In fact, the Rails guide for eager loading only mentions includes. However, there are other ways, and I want to argue that you should avoid includes.

Here's why:

  • includes makes it easy to introduce an odd bug
  • preload takes the same arguments as includes, but can't introduce the bug
  • When needed, eager_load also takes the same arguments, can introduce the bug, but makes it explicit

What bug? Let's show the introduction of an unexpected bug:

Say you have these records:

p = Post.create!
p.comments.create!(content: 'hello', spam: true)
p.comments.create!(content: 'world')

You want posts that have at least one comment marked as spam.

# distinct because joins can make duplicates
# I dislike using joins for this.
posts = Post.joins(:comments).distinct
            .where(comments: {spam: true})

How many comments in this array?

posts.first.comments.to_a.size

2, nothing special there.

You will need to display the comments of the posts, so you includes them.

How many comments are in the array?

posts.includes(:comments).first.comments.to_a

Only 1, the one with `spam = true`

What's the count?

posts.includes(:comments).first.comments.count

Back to 2!  ? 

What about size?

posts.includes(:comments).first.comments.size

Back to 1!  ? 

To be clear, the joins isn't needed for this to happen.

It was just to show a progression.

Post.includes(:comments)
    .where(comments: {spam: true})
    .first.comments.size
# => 1

Personally, I find that unexpected.

Imagine if the post is passed to a view or a helper, which uses the comments. It would only print the comments that matched the condition. Now if you try to debug from that helper, you would see that post.comments only has 1 comment.

Hopefully, you know that this is how includes (and eager_load, see below) behaves, so you may look up where the post comes from and figure it out. Good luck otherwise.

This is considered a feature. The consequences of doing conditions on eager loaded association are not in the guide, but they are in the middle of this section of the documentation.

I sometimes see this called "conditional eager loading", and the bug is doing it accidentally.

I consider the whole feature a maintenance burden. At least the guide doesn't recommend using it.

TL;DR: my recommendations

Things will get more technical in the next sections, so my condensed and straightforward recommendations are:

  • For new code, if you are only doing eager loading, use preload instead of includes. It does the same eager loading as includes, and takes the same arguments, but it ignores the conditions in the query. With it, things work as expected:
posts = Post.joins(:comments).distinct
            .where(comments: {spam: true})
posts.preload(:comments).first.comments.to_a.size #=> 2
posts.preload(:comments).first.comments.count #=> 2
posts.preload(:comments).first.comments.size #=> 2
  • If you need to order by an association, then eager_load is basically the only simple way to do so.
  • If you need a condition (a where) which uses an association, avoid includes, joins and eager_load.
    Instead, I recommend my gem: activerecord_where_assoc. Here's an introduction to it.
    It's made for this purpose, and will support many more use cases, such as:

    • Recursive associations (parent/child)
    • Polymorphic belongs_to
    • Negative conditions (ex: posts without comments marked as spam)
    • Multiple conditions on different records of the same association

    Alternatively, there's another gem for this: where_exists

# Same as before, posts that have at least one comment marked as spam
Post.where_assoc_exists(:comments, spam: true)
  • If includes seem to work somewhere that preload doesn't, you're probably doing a condition on an association or ordering by an association. See the previous points for this.

  • For existing code, you can't mindlessly change all includes to preload, because some of it may rely on includes adding a JOIN to the query (the eager_load way), which happens when the query refers to the table of the included associations. So while it would be better to change everything to preload and sometimes eager_load, every such change must be tested.

  • If you see an includes with a references, then that's just a call to eager_load. At this point, just use eager_load to make your code shorter.

So don't risk includes doing the wrong thing. preload means simple eager loading without the booby trap; you should use it. Treat eager_load as a warning sign that this could be doing conditional eager loading and be careful around it.

Down the rabbit hole

If you want to understand why I make those recommendations, we'll have to get technical...

Eager loading means loading associations of multiple records before they are needed. This is done to reduce the number of queries executed, making execution faster.

There are actually 3 methods for eager loading in Rails:

  • preload: Executes one extra query per association being eager loaded. Same as includes usually does.
  • eager_load: Adds JOIN to the SQL query and load the association without doing an extra query. This also enables adding conditions on the table, which is the cause of the conditional eager loading bug from the introduction.
  • includes: Picks between preload and eager_load based on if there is a reference, in the query, to the table of an association that was passed to includes. This can be from where or from joins.
    You may also specify an association with references to force the eager_load path, which is needed when your conditions are specified with a String instead of a Hash (which, again, causes conditional eager loading).

So out of the 3 methods, only one of them cannot trigger conditional eager loading: preload. It only does full eager loading, always the same way.

When is eager_load needed?

The main reason to use eager_load, that I have no alternative for, is ordering by an association's field.

# Ordering posts by created_at of last comment
Post.eager_load(:comments).order("comments.created_at DESC")

Maybe some use it to reduce the number of queries when they do eager loading. I don't think it really saves much, and there is a risk of slowing things down by making queries that are heavier.

Some may use it to actually do conditional eager loading. I still heavily disagree with that use case.

I've had to edit code that used this "feature" once...

You look at a method and it looks wrong; it can't be doing what it should be doing. It's using every project.users, not just those we want! When I did an interactive console there (binding.pry or byebug), I saw that users were missing from project.users.

Since I knew of this "feature", I started looking and, as expected, a condition on an includes was found... 3 method calls away from where the association was used, not a single comment to explain what is going on anywhere.

You should avoid code that looks wrong. Code that uses conditional eager loading looks wrong. In our case the overall module was already something that we wanted to rewrite from scratch, so this was just another reason to do so.

Other than ordering, I mostly see eager_load used to do a condition (a where) which uses an association. Let's dig into these.

where on an association with eager_load

It's a somewhat frequent need and there are many questions about this on stack overflow.

The bug from the introduction, accidentally doing conditional eager loading, started with such a need: "I want the posts that have comments marked as spam".

You may see a recommendation to use includes, and then have a condition on its table. This actually uses the eager_load path.

It looks like this:

# Please stop doing this :(
Post.includes(:comments).where(comments: {spam: true})
# which is equivalent to this; don't do this either
Post.eager_load(:comments).where(comments: {spam: true})

Again, this does conditional eager loading, which isn't what we asked for.

To be clear, the where on an association with includes / eager_load can be safe. But only if the association is a belongs_to. When it is, there are only 2 possibilities: either load the record and the associated belongs_to records, or don't load either. No conditional eager loading is possible.

But even when it's safe, there are risks:

  • Using includes / eager_load increases the chance for a mistake, where you or someone else just add another association to the existing eager loading call.
  • Every time a reader sees includes / eager_load, he may wonder if it is safe, or if there could be accidental conditional eager loading.

And as a tool, this isn't so great:

  • If you don't need the associated records, then eager loading them is wasteful.
  • Doesn't handle recursive associations (ex: parent/children)
  • Doesn't compose well
  • Looks potentially wrong when you know of the conditional eager loading "feature"

where on an association with joins

The next option is to use joins. It also has downsides.

It looks like this:

# Please stop doing this :(
Post.joins(:comments).where(comments: {spam: true})
# and stop doing this
Post.joins(:comments).distinct.where(comments: {spam: true})

Using joins like this is better than includes / eager_load since at least, there is no risk of conditionally loading an association. But there are still problems with it:

  • Doesn't handle recursive associations (ex: parent/children)
  • Requires a distinct to avoid duplicated records when used with has_many associations.
    This can be unexpected if you're doing a more complex query than j ust fetching records.
  • Doesn't compose well

where on an association with Arel

Truth is, this need for a where on an association isn't something that ActiveRecord supports well. So leaving the ActiveRecord only solutions, you can do an actual EXISTS query with Arel. EXISTS is the SQL tool that is meant to do this type of condition, not JOIN.

It looks like this:

# An OK way, but error prone
Post.where(Comment.where("posts.id = comments.post_id").where(spam: true).arel.exists)

This composes much better with other tools because all it does is add a single WHERE clause to the query. It works as you would expect with or, not and with other conditions on the same association.

But there still are new downsides:

  • You must manually write condition to link the posts to the comments. It's easy to forget it, and I've seen StackOverflow answers that forgot to do so.
    You won't get any error for forgetting, your query will just be wrong, which may not even be obvious if all you have is a little test data. Bonus: This can get extra tedious for polymorphic associations, where you also need to this check: foos.owner_type = #{Bar.base_class.name}.
  • If a condition was given when defining the association, you must also manually rewrite it.
  • Only the models are named in the code, not the association of interest. This makes the intent less clear, especially when non-trivial associations exist.
  • Extra work to handle recursive associations (ex: parent/children)
  • Quite a bit longer to write, and this is a short example.

Other than writing the whole condition manually, which would have all the problems of the Arel way, but be more verbose and more error-prone, I think we're out of built-in ways.

where on an association with activerecord_where_assoc

What I recommend for conditions based on associations is a gem I made just for this purpose: activerecord_where_assoc. It looks like this:

# Please consider doing this:
Post.where_assoc_exists(:comments, spam: true)
# Or using a scope such as is_spam:
Post.where_assoc_exists(:comments) { is_spam }

The query it generates is the same as the Arel example, with the same benefits and more. See for yourself:

  • It just adds a single where condition, so it composes well and works with or and with other conditions on the same association.
  • Handles recursive associations automatically (ex: parent/children)
  • Handles polymorphic belongs_to (includes and joins would simply refuse)
  • Easy to do a NOT of the condition (I.E.: where no comment is marked as spam) with where_assoc_not_exists.
  • Composes with other such queries, even on the same association, even with negations
  • Unlike Arel, this uses the association's name, so the intent is clearer.

So if you need to do this kind of condition, here are some references for my gem:

There's simply no way I could find to use builtin tools to have this query be clear, succinct and not booby trapped. Either live with the booby traps, write your own methods to do this cleanly, or use one of the gems written for this purpose:

Seriously, try any of them, it's liberating how simple this once complex task becomes.

But includes is everywhere

It is! Let's explore the reasons I can think of.

includes is the "smart" function out of the 3, it will pick the "right" strategy when needed.

Marketing-wise, this sounds like a good thing... Until you learn that the alternate path, eager_load, is not always what you want and it can cause bugs due to conditional eager loading.

For a long time, includes (and eager_load) were the only way to do a LEFT JOIN

The method left_joins was added in Rails 5.0. Before that, if you wanted one, you had to either do includes / eager_load, or write the whole "LEFT JOIN" yourself like this: joins("LEFT JOIN comments ON comments.post_id = posts.id"). The includes shortcut was often suggested.

includes has always been recommended, so most are familiar with it, and most recommend it.

Everything is against preload, even it's documentation makes preload sound like an alias for includes, and the Rails guide only mentions includes for eager loading data.

I think not enough people were both harmed by includes and aware that you can just specify preload and eager_load for that knowledge to spread.

Recap

Conditional eager loading:

  • includes and eager_load can accidentally eager load only part of an association, a good source of bugs.
  • Doing conditional eager loading voluntarily can be maintenance burden
  • If you do want conditional eager loading, using eager_load makes it a bit more obvious.

Conditions based on associations:

  • Using includes and eager_load for conditions based on associations can do conditional eager loading at the same time, you will get bitten by the bugs it can causes.
  • if you don't need to load the association, eager loading it is wasteful
  • Using specialized gems to do conditions based on association is safer, clearer and easier.

Order based on association:

  • includes and eager_load are the only simple way.
  • Using eager_load is explicit about the use case, and you don't need to also call references.

Regular old eager loading:

  • Just use preload

If you want to run the examples from this post, here is a self-contained ruby script.