Low hanging fruits for better SQL performance in Rails



As your app starts to get more traffic you might notice it getting more sluggish. To fix this most people start by looking at the memory and CPU utilization on their server. And usually that is a bottleneck. But this usually solves the problem for a short while before it starts to get sluggish again. One area that is often overlooked is the SQL performance and the number of queries that your are performing per request.

Here I’m going to cover a few low hanging fruits that are very commonly forgotten or overlooked. This is especially true of more inexperienced developers who haven’t had the chance to work on applications that are deployed at scale and I’ve certainly been guilty of all of them.

Use Indexes on columns for constant lookup times

This is not something that is commonly taught or mentioned in the books but lack of indexes on columns is one of the biggest reasons certain queries take seconds instead of milliseconds to execute. If you don’t know what indexes are, I would suggest reading about them first and then coming back and reading this article.

A User model is an extremely good example of where an Index should be applied. So you have a User model with a column email. This is a really common use case and most applications use either a user_name field or a email field for authentication.

So lets take a look at how your controller will probably look.

class UsersController < ApplicationController
  def sign_in
    @user = User.where("email = ?", params[:email])
    if @user.password == params[:password]
      redirect_to secret_page
    else
      flash[:error] = "your email/password seems to be invalid"
      redirect_to my_sign_in_page
    end
  end
end

Now the code looks perfectly fine and there is nothing wrong with it. However this going to result in a slow query. You might notice this problem when your database has a few thousand records. But your query execution time will start going up a little by little as you add more users.

The thing that wrong here is that the email field does not have an index on it. As a result when you are executing a query mysql (or Postgres) has to do whats called a full-index scan. What that means is that it has to go though each record in the table, line by line and see if it matches your input email.

We can easily fix this by creating an Index on the email columns. Start by creating a new migration and then adding these lines.

add_index :users, :email

Using an index keeps lookup times constant irrespective of the table size.

A more advanced case on this is when you are using an username and email field together and you have an query such as this

class UsersController < ApplicationController
  def sign_in
    @user = User.where("username = ? OR email = ?", params[:username], params[:email])
    if @user.password == params[:password]
      redirect_to secret_page
    else
      flash[:error] = "your email/password seems to be invalid"
      redirect_to my_sign_in_page
    end
  end
end

What happens is that you are querying both columns at the same time. For this there is a complex version of the index called a compound index, it allows you to create 1 index with data from multiple columns. To create an compound index, use the following code in a migration. The array can contain any number of fields that are being used in the query.

  add_index :users,[:email, :username]

A few caveats to keep in mind. Indexes do have an impact on insert (write) speed. The more indexes you have on an table, the slower the inserts are going to be as the database has to balance the index tree. So when creating an index, make sure that you actually need it and try to use as few as possible.

When using compound indexes be sure that the query is actually using all the fields at the SAME time. If your query is only using 1 field in the compound index, the index will not be used and thus it will resolve to doing a full-index scan. Compound keys also have the highest impact on Insert speed so use them as sparingly as possible.

Indexes on Polymorphic Relationships and Foreign Keys

Another area most programmers forget to add indexes are on polymorphic relationships and foreign keys. If you have any classes that are polymorphic don’t forget to add indexes to their columns

# polymorphic example from
# http://guides.rubyonrails.org/association_basics.html#the-has-many-through-association

class Picture < ActiveRecord::Base
  belongs_to :imageable, polymorphic: true
end

class Employee < ActiveRecord::Base
  has_many :pictures, as: :imageable
end

class Product < ActiveRecord::Base
  has_many :pictures, as: :imageable
end

To get the best performance, add an index on the imageable_id. If you have any polymorphic relationships make sure that you have an index on the *_id field.

add_index :pictures, :imageable_id

Foreign and join tables also are places people forget to apply indexes. Take a few hours to go though your database to find tables where you can get quick wins by applying indexes. If you are using a Mac along with MySQL, I recommend Sequel Pro for browsing through your database.

Select only the Columns that are Needed

One ActiveRecord method I see not used often enough is select. Lets Take an example of an User model again. Its has username, email, password, created_at and possibly a number of other fields in the model.

Lets assume you have an admin panel of sorts and you want to view all email ids of your users in your application. So in your controller you would write something like this

class UsersController < ApplicationController
  def index
    @users = User.all
  end
end

While its perfectly good and reasonable code, it can be made better. See the thing is that you only need the email field, but because we haven’t told that to ActiveRecord, it assumes that we want to all the columns. So what ends up happening is that you’ve loaded 3x more data than you actually needed and hence consuming 3x more RAM on your server. This might look innocuous enough right now, but it starts to add up as the number of fields on your database increase.

The better way to do this would be using the select command and specifying the fields that you want to use.

class UsersController < ApplicationController
  def index
    @users = User.select('email').all
  end
end

The select command loads only the columns that you want. I’ve seen performance gain of 2x - 5x while doing this on the expensive queries.

Use find_each to batch load data

Most applications need to do things in the background, maybe its send a bunch of emails, generate some reports, maybe its send data to APIs etc. The code that goes into these is usually an after thought. While doing tasks in the background make sure that you are not loading in entire tables at once. This could impact the performance of other queries severely. Instead execute your tasks by loading in your data in batches of 100s or 1000s. The ActiveRecord find_each command is made for just this. By default the find_each command will load records from your database in batches of 1000, although you also pass in a custom number for when tables have a very large number of columns.

# http://api.rubyonrails.org/classes/ActiveRecord/Batches.html#method-i-find_each
Person.find_each do |person|
  person.do_awesome_stuff
end

Person.where("age > 21").find_each do |person|
  person.party_all_night!
end

Eager Loading and removing N+1 queries

The topic of eliminating N+1 queries is an article by itself but something that I wanted to mention. Removing N+1 queries can result in performance gain in the magnitude of 10x. Simply because at times you may find yourself running 100 queries on a single page. So you can imagine the amount of time that you save by eliminating these. The bullet gem is a nifty little tool shows when your query is resulting in an N+1 call. Install the gem and start going page by page to find the worst offenders.

Conclusion

Most methods that i’ve listed above are quite simple, straightforward and quick to implement. Unlike the more complex methods such as caching these don’t require any architectural changes. By making these small changes you not only make your users happy by loading your pages faster, but also save money by getting the most out of your existing server infrastructure.

Are there are some methods that you’ve used or know about that are commonly overlooked by most programmers. Tell everyone about them by mentioning them in the comments below.

comments powered by Disqus