Wednesday, May 8, 2024
HomeRuby On RailsSafely Including NOT NULL Columns to Your Database Tables — Infrastructure (2022)

Safely Including NOT NULL Columns to Your Database Tables — Infrastructure (2022)


The Database Migrations crew is in control of creating and sustaining all the inner instruments for Shopify groups to hold out schema modifications safely and with minimal downtime. One among our newest investigations concerned utilizing the favored Massive Hadron Migrator (LHM) gem to carry out schema modifications in MySQL databases, after safety-checking that the migrations will be carried out with out shedding any knowledge. Particularly, when including a NOT NULL columns to an present desk.

On this put up, I will share what we discovered and our suggestions for doing so to your personal database tables.

First, it’s vital to grasp how schema modifications security is outlined.

For this function, we must always concentrate on the process LHM makes use of to execute migrations, because it makes use of the shadow-table mechanism to make sure there’s minimal downtime whereas the migrations are being carried out. In a nutshell, LHM creates a brand new desk (often called shadow desk) with the schema change utilized, and units up triggers on the unique desk to populate any knowledge associated operations (INSERT, UPDATE and DELETE) to the shadow desk. Then, it begins copying data, in batches, from the unique desk to the shadow one. When all data have been copied to the shadow desk, LHM robotically renames the 2 tables after which drop the triggers from the unique one.

This process ensures there’s minimal downtime (see MySQL rename desk limitations), whereas migrations are going down, but it surely introduces a brand new set of potential issues, as we have to be sure that the batched insertions don’t drop data within the course of, which can occur relying on the schema change utilized.

Due to this fact, schema modifications are recognized as secure if:

  • After the migration has began: INSERT, UPDATE and DELETE operations concentrating on the unique desk can populate knowledge to the shadow one (by way of MySQL triggers), with out crashing. This is called backward compatibility.
  • As soon as the migration has completed: the variety of data within the shadow desk should be equal to the variety of data within the unique desk (by way of MySQL triggers and LHM batched insertions).

As acknowledged within the title, this investigation focuses on one particular set of schema modifications, these including  NOT NULL constrained columns to a desk. Contemplating how LHM works internally, one can solely guess what values are going to be populated to the newly added column, for all of the unique desk data, which lack a price for that column within the first place.

Due to this fact, the investigation of those schema modifications security will take into account the next elements:

  • The inclusion of a DEFAULT worth for the brand new column definition, in the identical migration.
  • The inclusion of a UNIQUE INDEX for the brand new column, in the identical migration.
  • The mode the MySQL occasion is configured to (strict or non-strict).

With a purpose to perform the investigation, a number of steps are outlined to simulate how LHM will perform the migrations whereas iterating on the thought of elements. For demonstration functions, let’s assume the migration provides a NOT NULL column known as “last_name” to a desk named “customers” .

1. Initialization: The MySQL mode is ready and the unique desk created.

2. Desk creation: Simulate how LHM would apply the migration.

3. Triggers definition: Simulate how LHM would arrange MySQL triggers.

As soon as the experiment setup is outlined, we are able to execute data-related SQL operations to verify how MySQL triggers will populate them from the unique to the shadow desk. These statements are depending on the SQL operation we’re testing (INSERT, UPDATE and DELETE).

1. Populate preliminary knowledge: Create preliminary data to execute the investigation queries on.

2. Activate the SQL triggers: Run particular queries to activate the LHM simulated triggers.

3. Examine the outcomes: Execute a SELECT question each within the unique and within the shadow desk, with a view to evaluate their data (each in size and content material).

Concatenating all of the Experiment Setup part steps alongside these ones, we are able to decide the security of including NOT NULL column schema modifications, relying on the elements acknowledged in part 2 ( DEFAULT clause, UNIQUE INDEX existence, and MySQL mode). A whole back-to-back experiment, for INSERT operations, would appear to be this:

Screenshot showing data loss in a shadow table.
Knowledge loss is proven by lacking data within the shadow desk on the suitable.

Because the variety of data between the unique and shadow tables is totally different, we conclude that performing INSERT operations, when there’s a NOT NULL, DEFAULT outlined schema-change, that additionally introduces a UNIQUE INDEX on that column, can produce knowledge loss when the MySQL occasion is configured with an strict mode.

Just like how we constructed an experiment case within the earlier part, we are able to iterate on the experiment elements (operation kind, DEFAULT worth inclusion, UNIQUE INDEX presence, and MySQL mode) to construct a matrix of schema change security for all of the ensuing mixtures.

As a reminder, schema change security is decided by answering two questions:

  • Will the migration be backwards appropriate? In different phrases, whether or not INSERT, UPDATE and DELETE operations concentrating on the unique desk can populate knowledge to the shadow one (by way of MySQL triggers), with out crashing.
  • Will the migration introduce knowledge loss? In different phrases, whether or not the variety of data within the shadow desk equals the one within the unique desk, as soon as the migration has completed.

Operation

Column spec

Schema change

MySQL mode

Backward appropriate?

Knowledge loss?

INSERT

NOT NULL, with DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

No

NO_ENGINE_SUBSTITUTION

Sure

No

Contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

Sure

NO_ENGINE_SUBSTITUTION

Sure

Sure

NOT NULL with out DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

No*

Contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

Sure

Operation

Column spec

Schema change

MySQL mode

Backward appropriate?

Knowledge loss?

UPDATE

NOT NULL with DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

No

NO_ENGINE_SUBSTITUTION

Sure

No

Contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

Sure

NO_ENGINE_SUBSTITUTION

Sure

Sure

NOT NULL with out DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

No*

Contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

Sure

Operation

Column spec

Schema change

MySQL mode

Backward appropriate?

Knowledge loss?

DELETE

NOT NULL with DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

No

NO_ENGINE_SUBSTITUTION

Sure

No

Contains UNIQUE INDEX

STRICT_ALL_TABLES

Sure

No

NO_ENGINE_SUBSTITUTION

Sure

No

NOT NULL with out DEFAULT worth

Not contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

No*

Contains UNIQUE INDEX

STRICT_ALL_TABLES

No

NO_ENGINE_SUBSTITUTION

Sure

No*


*
The variety of data within the shadow desk matches the one within the unique desk, however an implicit DEFAULT worth is chosen for the brand new column. For this experiment, the worth was the empty string (“”), however it’s going to differ relying on the info kind (verify MySQL implicit defaults).

Contemplating the matrix of instances from earlier part:

  1. Keep away from including a NOT NULL column with out a DEFAULT worth.Schema modifications introducing NOT NULL columns should outline a DEFAULT worth to keep away from sudden outcomes when the migrations are going down.

    Within the worst case (when the MySQL occasion is configured with a strict mode), the desk affected by the migration will break compatibility for present purposes, as beforehand used data-related operations (INSERT / UPDATE / DELETE) couldn’t be populated to the shadow desk, after the migrations begins.

    In the very best case (when the MySQL occasion is configured with a non-strict mode), the data populated from the unique to the shadow desk, both by the MySQL triggers or LHM batched insertions, will obtain an implicit DEFAULT worth for the brand new column, which might be undesirable.

  2.  Be extraordinarily cautious when including a UNIQUE INDEX.The introduction of UNIQUE indexes in schema-changes engines that use the shadow desk mechanism to hold out their migrations proved to be harmful as it might probably result in knowledge loss when there are duplicate values, for the index lined columns, previous to the migrations.

    It’s advisable that builders verify for the existence of duplicates earlier than they add a UNIQUE index on a set of columns.

This investigation was accomplished because of the suggestions supplied by all of the DB Migrations crew members: Bastian Bartmann, Sergey Fedorov, Anya Zenkina, Xiaoli Liang; and the devoted steerage of Shuhao Wu.

Sinclert Pérez is a Manufacturing Engineer at Shopify. You’ll be able to join with him on Twitter and Github.

If constructing programs from the bottom as much as remedy real-world issues pursuits you, our Engineering weblog has tales about different challenges now we have encountered. Go to our Engineering profession web page to search out out about our open positions. Be part of our distant crew and work (nearly) wherever. Study how we’re hiring to design the long run collectively—a future that’s digital by design.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments