The Database Migrations crew is in control of creating and sustaining all the inner instruments for Shopify groups to hold out schema modifications safely and with minimal downtime. One among our newest investigations concerned utilizing the favored Massive Hadron Migrator (LHM) gem to carry out schema modifications in MySQL databases, after safety-checking that the migrations will be carried out with out shedding any knowledge. Particularly, when including a NOT NULL
columns to an present desk.
On this put up, I will share what we discovered and our suggestions for doing so to your personal database tables.
First, it’s vital to grasp how schema modifications security is outlined.
For this function, we must always concentrate on the process LHM makes use of to execute migrations, because it makes use of the shadow-table mechanism to make sure there’s minimal downtime whereas the migrations are being carried out. In a nutshell, LHM creates a brand new desk (often called shadow desk) with the schema change utilized, and units up triggers on the unique desk to populate any knowledge associated operations (INSERT
, UPDATE
and DELETE
) to the shadow desk. Then, it begins copying data, in batches, from the unique desk to the shadow one. When all data have been copied to the shadow desk, LHM robotically renames the 2 tables after which drop the triggers from the unique one.
This process ensures there’s minimal downtime (see MySQL rename desk limitations), whereas migrations are going down, but it surely introduces a brand new set of potential issues, as we have to be sure that the batched insertions don’t drop data within the course of, which can occur relying on the schema change utilized.
Due to this fact, schema modifications are recognized as secure if:
- After the migration has began:
INSERT
,UPDATE
andDELETE
operations concentrating on the unique desk can populate knowledge to the shadow one (by way of MySQL triggers), with out crashing. This is called backward compatibility. - As soon as the migration has completed: the variety of data within the shadow desk should be equal to the variety of data within the unique desk (by way of MySQL triggers and LHM batched insertions).
As acknowledged within the title, this investigation focuses on one particular set of schema modifications, these including NOT NULL
constrained columns to a desk. Contemplating how LHM works internally, one can solely guess what values are going to be populated to the newly added column, for all of the unique desk data, which lack a price for that column within the first place.
Due to this fact, the investigation of those schema modifications security will take into account the next elements:
- The inclusion of a
DEFAULT
worth for the brand new column definition, in the identical migration. - The inclusion of a
UNIQUE INDEX
for the brand new column, in the identical migration. - The mode the MySQL occasion is configured to (strict or non-strict).
With a purpose to perform the investigation, a number of steps are outlined to simulate how LHM will perform the migrations whereas iterating on the thought of elements. For demonstration functions, let’s assume the migration provides a NOT NULL
column known as “last_name” to a desk named “customers” .
1. Initialization: The MySQL mode is ready and the unique desk created.
2. Desk creation: Simulate how LHM would apply the migration.
3. Triggers definition: Simulate how LHM would arrange MySQL triggers.
As soon as the experiment setup is outlined, we are able to execute data-related SQL operations to verify how MySQL triggers will populate them from the unique to the shadow desk. These statements are depending on the SQL operation we’re testing (INSERT
, UPDATE
and DELETE
).
1. Populate preliminary knowledge: Create preliminary data to execute the investigation queries on.
2. Activate the SQL triggers: Run particular queries to activate the LHM simulated triggers.
3. Examine the outcomes: Execute a SELECT
question each within the unique and within the shadow desk, with a view to evaluate their data (each in size and content material).
Concatenating all of the Experiment Setup part steps alongside these ones, we are able to decide the security of including NOT NULL
column schema modifications, relying on the elements acknowledged in part 2 ( DEFAULT
clause, UNIQUE INDEX
existence, and MySQL mode). A whole back-to-back experiment, for INSERT
operations, would appear to be this:
Because the variety of data between the unique and shadow tables is totally different, we conclude that performing INSERT operations, when there’s a NOT NULL
, DEFAULT
outlined schema-change, that additionally introduces a UNIQUE INDEX
on that column, can produce knowledge loss when the MySQL occasion is configured with an strict mode.
Just like how we constructed an experiment case within the earlier part, we are able to iterate on the experiment elements (operation kind, DEFAULT
worth inclusion, UNIQUE INDEX
presence, and MySQL mode) to construct a matrix of schema change security for all of the ensuing mixtures.
As a reminder, schema change security is decided by answering two questions:
-
Will the migration be backwards appropriate? In different phrases, whether or not
INSERT
,UPDATE
andDELETE
operations concentrating on the unique desk can populate knowledge to the shadow one (by way of MySQL triggers), with out crashing. - Will the migration introduce knowledge loss? In different phrases, whether or not the variety of data within the shadow desk equals the one within the unique desk, as soon as the migration has completed.
Operation |
Column spec |
Schema change |
MySQL mode |
Backward appropriate? |
Knowledge loss? |
INSERT |
NOT NULL, with DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
No |
NO_ENGINE_SUBSTITUTION |
Sure |
No |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
Sure |
||
NO_ENGINE_SUBSTITUTION |
Sure |
Sure |
|||
NOT NULL with out DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
|
NO_ENGINE_SUBSTITUTION |
Sure |
No* |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
||
NO_ENGINE_SUBSTITUTION |
Sure |
Sure |
Operation |
Column spec |
Schema change |
MySQL mode |
Backward appropriate? |
Knowledge loss? |
UPDATE |
NOT NULL with DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
No |
NO_ENGINE_SUBSTITUTION |
Sure |
No |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
Sure |
||
NO_ENGINE_SUBSTITUTION |
Sure |
Sure |
|||
NOT NULL with out DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
|
NO_ENGINE_SUBSTITUTION |
Sure |
No* |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
||
NO_ENGINE_SUBSTITUTION |
Sure |
Sure |
Operation |
Column spec |
Schema change |
MySQL mode |
Backward appropriate? |
Knowledge loss? |
DELETE |
NOT NULL with DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
No |
NO_ENGINE_SUBSTITUTION |
Sure |
No |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
Sure |
No |
||
NO_ENGINE_SUBSTITUTION |
Sure |
No |
|||
NOT NULL with out DEFAULT worth |
Not contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
|
NO_ENGINE_SUBSTITUTION |
Sure |
No* |
|||
Contains UNIQUE INDEX |
STRICT_ALL_TABLES |
No |
– |
||
NO_ENGINE_SUBSTITUTION |
Sure |
No* |
* The variety of data within the shadow desk matches the one within the unique desk, however an implicit DEFAULT
worth is chosen for the brand new column. For this experiment, the worth was the empty string (“”), however it’s going to differ relying on the info kind (verify MySQL implicit defaults).
Contemplating the matrix of instances from earlier part:
-
Keep away from including a NOT NULL column with out a DEFAULT worth.Schema modifications introducing
NOT NULL
columns should outline aDEFAULT
worth to keep away from sudden outcomes when the migrations are going down.Within the worst case (when the MySQL occasion is configured with a strict mode), the desk affected by the migration will break compatibility for present purposes, as beforehand used data-related operations (
INSERT
/UPDATE
/DELETE
) couldn’t be populated to the shadow desk, after the migrations begins.In the very best case (when the MySQL occasion is configured with a non-strict mode), the data populated from the unique to the shadow desk, both by the MySQL triggers or LHM batched insertions, will obtain an implicit
DEFAULT
worth for the brand new column, which might be undesirable. -
Be extraordinarily cautious when including a UNIQUE INDEX.The introduction of
UNIQUE
indexes in schema-changes engines that use the shadow desk mechanism to hold out their migrations proved to be harmful as it might probably result in knowledge loss when there are duplicate values, for the index lined columns, previous to the migrations.It’s advisable that builders verify for the existence of duplicates earlier than they add a
UNIQUE
index on a set of columns.
This investigation was accomplished because of the suggestions supplied by all of the DB Migrations crew members: Bastian Bartmann, Sergey Fedorov, Anya Zenkina, Xiaoli Liang; and the devoted steerage of Shuhao Wu.
Sinclert Pérez is a Manufacturing Engineer at Shopify. You’ll be able to join with him on Twitter and Github.
If constructing programs from the bottom as much as remedy real-world issues pursuits you, our Engineering weblog has tales about different challenges now we have encountered. Go to our Engineering profession web page to search out out about our open positions. Be part of our distant crew and work (nearly) wherever. Study how we’re hiring to design the long run collectively—a future that’s digital by design.