Sunday, September 8, 2024
HomeRuby On RailsHow one can add index to a giant desk of your Rails...

How one can add index to a giant desk of your Rails app


When your software is profitable, a number of the tables can develop fairly massive — I’m taking a look at you customers desk. When you’re curious sufficient, you periodically examine how your database performs. If any sluggish question pops up within the metrics, there’s a fantastic probability that some index is lacking.

State of the DB engines

Whereas most fashionable database engines can create indexes in an asynchronous, non–blocking method, it’s good to get conversant in all of the exceptions from this rule. I extremely suggest studying the documentation of PostgreSQL, MySQL. I’m unsure about SQLite right here because the documentation doesn’t clearly state that. Nevertheless, my fast chit–chat with LLM might offer you some insights.

What’s the issue then?

As you already know CREATE INDEX assertion will likely be dealt with asynchronously by the database if acceptable algorithm is used. Because of this no reads, writes and replace will likely be blocked.

Usually, for the Rails software, you’ll run a migration by way of Ruby course of throughout deployment, utilizing all of the goodies from ActiveRecord::Migration class and its environment.

Let’s think about the database schema like that:

ActiveRecord::Schema[7.1].outline(model: 2024_06_13_121701) do
  create_table "customers", drive: :cascade do |t|
    t.string "electronic mail"
    t.boolean "energetic"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  finish
finish

Think about you need to shortly discover all these customers which didn’t activate the account:

Consumer.the place(energetic: false)

When you have sufficient customers, talking of dozens or tons of of tens of millions, doing full desk scan might merely kill the database efficiency. Full desk scan occurs when database has no index to make use of and have to examine each row whether or not it meets the standards.

I’ll keep on with PostreSQL within the examples if not said in any other case.

Apparent answer

Let’s add the index then:

➜  trololo git:(grasp) bin/rails g migration AddIndexOnActiveUsers
      invoke  active_record
      create    db/migrate/20240613121751_add_index_on_active_users.rb

Implementation:

# PostgreSQL
class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
  disable_ddl_transaction!

  def change
    add_index :customers, :energetic, algorithm: :concurrently
  finish
finish

# MySQL
class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
  def change
    add_index :customers, :energetic, algorithm: :inplace
  finish
finish

Let’s run it:

➜  trololo git:(grasp) bin/rails db:migrate
== 20240613121751 AddIndexOnActiveUsers: migrating ============================
-- add_index(:customers, :energetic)
   -> 0.0013s
== 20240613121751 AddIndexOnActiveUsers: migrated (0.0013s) ===================

Okay, it was fairly quick, proper? That’s appropriate on your dev machine, not essentially for the manufacturing setup. Now amplify this time 0.0013s by 6 or 7 orders of magnitude in case your desk is sufficiently big:

➜  trololo git:(grasp) bin/rails db:migrate
== 20240613121751 AddIndexOnActiveUsers: migrating ============================
-- add_index(:customers, :energetic)
   -> 13000.4928s
== 20240613121751 AddIndexOnActiveUsers: migrated (0.0013s) ===================

I’ll do the maths for you: 1300.4928s means 21 minutes 40.49 seconds. However it may be even longer — don’t ask me how I discovered about this.

Whereas the method of migration will find yourself finally, the migration blocking some other deployments to your software throughout this time could also be unacceptable for varied causes:

  • you can not launch some other change to manufacturing till migration completes
  • one thing different goes improper and it is advisable quickly deploy a hotfix, however you possibly can’t because the deployment is blocked by lengthy working migration™
  • course of supervisor on a deployment machine might count on output inside, e.g. 5 minutes. Lengthy working migration will get killed in such situation.

What to do then?

Merely skip the migration physique for RAILS_ENV=manufacturing:

class AddIndexOnActiveUsers < ActiveRecord::Migration[7.1]
  disable_ddl_transaction!

  def change
    add_index :customers, :energetic, algorithm: :concurrently, if_not_exists: true until Rails.env.manufacturing?
  finish
finish

The migration will likely be executed on manufacturing setting, however could have no impact in your database. schema.rb or construction.sql (relying on what you employ) will likely be aligned. Adequate entry may also seem in schema_migrations desk. All of the builders could have index added on their native databases, check setting will likely be aligned too.

However Hey, the place’s my index on manufacturing?! you may ask. And that’s a fairly legitimate query. What you’ll want is a solution to run a rails runner or rake job on manufacturing. Relying on the capabilities you’ve, you may select to:

  • Run it inside bin/rails console:
# PostgreSQL
ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :concurrently, if_not_exists: true

# MySQL
ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :inplace, if_not_exists: true
  • Do the identical by way of bin/rails runner:
# PostgreSQL
bin/rails r "ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :concurrently, if_not_exists: true"

#MySQL
bin/rails r "ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :inplace, if_not_exists: true"
  • Final, however not least, implement a Rake job. It has the benefit that it needs to be dedicated to the repository so that you don’t lose the historical past what’s occurred:
# PostgreSQL
namespace :indexes do
  job add_index_on_active_users: :setting do
    ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :concurrently, if_not_exists: true
  finish
finish

# MySQL
namespace :indexes do
  job add_index_on_active_users: :setting do
    ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :inplace, if_not_exists: true
  finish
finish

Execute it with bin/rails indexes:add_index_on_active_users.

For the final choice it’s additionally simple to reinforce it with logging to simply determine execution in Grafana, Datadog or some other software you employ on your logs.

namespace :indexes do
  job add_index_on_active_users: :setting do
    Rails.logger.data("job indexes:add_index_on_active_users began")

    ActiveRecord::Migration.add_index :customers, :energetic, algorithm: :concurrently, if_not_exists: true

    Rails.logger.data("job indexes:add_index_on_active_users completed”)
  finish
finish

Tiny particulars

When you’re conscious sufficient, you in all probability noticed if_not_exists: true flag. We like idempotence and that’s the rationale. If anybody runs this job once more, nothing will occur. When you favor to see ActiveRecord::StatementInvalid as a substitute, be at liberty to skip it.

As talked about within the preface, to make use of acceptable algorithm for index creation algorithm: :concurrently for PostgreSQL and algorithm: inplace for MySQL needs to be specified.

There’s one other quirk for PostgreSQL:

One other distinction is {that a} common CREATE INDEX command will be carried out inside a transaction block, however CREATE INDEX CONCURRENTLY can not.

Every Rails migration execution is wrapped inside transaction. To disable this conduct, it is advisable use disable_ddl_transaction! methodology inside your migration. In any other case, following error will pop up:

bin/rails aborted!
StandardError: An error has occurred, this and all later migrations canceled: (StandardError)

PG::ActiveSqlTransaction: ERROR:  CREATE INDEX CONCURRENTLY can not run inside a transaction block
/Customers/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'

Attributable to:
ActiveRecord::StatementInvalid: PG::ActiveSqlTransaction: ERROR:  CREATE INDEX CONCURRENTLY can not run inside a transaction block (ActiveRecord::StatementInvalid)
/Customers/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'

Attributable to:
PG::ActiveSqlTransaction: ERROR:  CREATE INDEX CONCURRENTLY can not run inside a transaction block (PG::ActiveSqlTransaction)
/Customers/fidel/code/fidel/trololo/db/migrate/20240613121751_add_index_on_active_users.rb:5:in `change'
Duties: TOP => db:migrate
(See full hint by working job with --trace)

Nevertheless, it’s not an issue for our customized script or Rake job run on manufacturing setting.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments