Friday, May 17, 2024
HomeRuby On RailsCouldn't create distinctive index: the best way to resolve duplication errors

Couldn’t create distinctive index: the best way to resolve duplication errors


Distinctive indexes implement the individuality (as their identify implies) of a number of columns’ values. They’ll assure one-to-one relationships in your mannequin. As an illustration, let’s have a singular procuring cart per person:

=# create distinctive index uidx_shopping_carts_on_user_id on shopping_carts (user_id);

ERROR:  may not create distinctive index "uidx_shopping_carts_by_user_id"
DETAIL:  Key (user_id)=(2) is duplicated.
Enter fullscreen modeExit fullscreen mode

Wow! What was that? Plainly the index cannot be created: some current rows already infringe the constraint.



Oopsy

Let’s discover the info a bit. Can we spot culprits?

=# choose user_id from shopping_carts order by user_id;

 user_id 
---------
       1
       2
       2
       3
       4
       5
Enter fullscreen modeExit fullscreen mode

Oopsy, “2” seems twice. This may be fairly a problem. It will probably go unsuitable for 3 causes:

  • The person can see the unsuitable cart, resulting in objects disappearing
  • Your knowledge group has to use heuristics to reunify the duplicates
  • Worst case: it results in complicated bugs as a result of everybody suppose procuring carts are distinctive

Sadly, postgres can not create a singular index so long as the duplicate objects exist.

First issues first: what number of duplicates do you could have? Some self becoming a member of request does the trick:

=# choose
-=   depend(*)
-= from shopping_carts a
-= be a part of (choose
-=   user_id,
-=   depend(*)
-= from shopping_carts
-= group by user_id
-= having depend(*) > 1) as b
-=   on b.user_id = a.user_id; 

 depend 
-------
 76533
Enter fullscreen modeExit fullscreen mode

There are too many duplicates for a guide cleanup… Create a dashboard to trace the duplicate injection fee. In case your desk already features a created_at datetime, you should use a variation of the earlier question:

=# choose
-#   last_occurrence,
-#   depend(*)
-# from shopping_carts a
-# be a part of (choose
-#   user_id,
-#   max(date(created_at)) as last_occurrence,
-#   depend(*)
-# from shopping_carts
-# group by user_id
-# having depend(*) > 1) as b
-#   on b.user_id = a.user_id
-# group by last_occurrence
-# order by last_occurrence DESC;

 last_occurrence | depend 
-----------------+-------
 2022-09-13      |   193
 2022-09-12      |   484
Enter fullscreen modeExit fullscreen mode

We’ve got the duplicate fee per day. That is helpful to verify if our future fixes work.



Take away the foundation trigger

Let’s contemplate your endpoint implements the next pseudo-ruby code:

def create
  ShoppingCarts.create!(...)
finish
Enter fullscreen modeExit fullscreen mode

A fast repair is to verify the existence of the row earlier than creating it:

def create
  if ShoppingCarts.the place(user_id: current_user.id).exists? do
    ShoppingCarts.create!(...)
  finish
finish
Enter fullscreen modeExit fullscreen mode

This manner no duplicate creation, proper? You deploy and verify the numbers the subsequent day:

 last_occurrence | depend 
-----------------+-------
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484
Enter fullscreen modeExit fullscreen mode

Wait what? Nonetheless extra duplicates!!!!



Transactions

It is a race situation taking place. Two requests verify existence and insert on the similar time.

request 1 request 2 choose depend(*) from shopping_carts the place id = 2
exists()… == false 0
exists()… == false 0
insert()… 1
insert()… 2

Transactions are gonna assist, however in a roundabout way.

def create
  ActiveRecord::Base.transaction do
    if ShoppingCarts.the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish
Enter fullscreen modeExit fullscreen mode

You deploy this code and watch its results:

 last_occurrence | depend 
-----------------+-------
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484
Enter fullscreen modeExit fullscreen mode

The issue just isn’t solved. Transactions are configured to dedicated learn isolation by default. In the event you replay the earlier race situation below this isolation, you will discover no violation. Transactions don’t keep away from race situations. Thankfully, there are instruments to cease the bleeding: locks.

Locks are normally managed by your database for operations like indexation, and so on. Locks are a typical solution to mutually exclude processes from shared assets. You may also manually lock a row. Mutual exclusion is stopping 2 processes to entry the identical useful resource.

In postgres you possibly can lock rows or tables utilizing the for udpate key phrase in a choose question:

def create
  ActiveRecord::Base.transaction do
    if ShoppingCarts.lock("FOR UPDATE").the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish
Enter fullscreen modeExit fullscreen mode

You deploy and verify the numbers the subsequent day:

 last_occurrence | depend 
-----------------+-------
 2022-09-16      |    11
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484
Enter fullscreen modeExit fullscreen mode

Oopsy, the lock doesn’t work? As a result of the choose question returns no rows, no lock is upheld. We have to lock one thing that exists. Good thing that we now have a singular person!

def create
  current_user.with_lock do
    if ShoppingCarts.the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish
Enter fullscreen modeExit fullscreen mode

Right here the transaction is lastly useful. The lock is launched when the transaction is dedicated. You possibly can deploy and see the consequence:

 last_occurrence | depend 
-----------------+-------
 2022-09-16      |    11
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484
Enter fullscreen modeExit fullscreen mode

No line for 2022-09-17: victory! Subsequent, we take away duplicates and create the distinctive index.



Eradicating duplicates

Eradicating duplicates is usually a sensitive matter. I discover the next heuristic true more often than not: the newest modified row is probably the most up to date. Some frameworks mechanically generate updated_at columns. This may be helpful to use the heuristic. We use updated_at column to discriminate the previous rows to delete:

choose
  b.id
from shopping_carts a
be a part of (choose
  user_id,
  max(updated_at) AS last_updated_at
from shopping_carts
group by user_id
having (depend(*) > 1)) b
  on a.user_id = b.user_id
  and a.updated_at < b.last_updated_at
Enter fullscreen modeExit fullscreen mode

This manner solely the final up to date rows stay. Let’s verify if there is no duplicate left:

=# choose
-=   depend(*)
-= from shopping_carts a
-= be a part of (choose
-=   user_id,
-=   depend(*)
-= from shopping_carts
-= group by user_id
-= having depend(*) > 1) as b
-=   on b.user_id = a.user_id; 

 depend 
-------
     0
Enter fullscreen modeExit fullscreen mode

Good!



Creating the distinctive index

With no duplicate within the desk left, we are able to create the index:

=# create distinctive index uidx_shopping_carts_on_user_id on shopping_carts (user_id);
Enter fullscreen modeExit fullscreen mode

This time, there is no error. Each person has at most one procuring cart. When making an attempt to create duplicates, you encounter the next error:

=# insert into shopping_carts(user_id) values (2), (2);

ERROR:  duplicate key worth violates distinctive constraint "uidx_shopping_carts_by_user_id"
Enter fullscreen modeExit fullscreen mode

No course of can violate the enterprise rule, even below race situations! Nice worth for the buck in the event you ask me.



Conclusion

I hope you discovered some sql data. Listed below are the primary takeaways:

  1. Be taught some sql syntax to navigate simply in your knowledge
  2. Columns created_at and updated_at managed by your ORM are nice instruments for upkeep
  3. Transactions do not keep away from race situations
  4. Introduce distinctive constraints earlier than you feed the info (this is applicable to different constraints)

I needed to resolve an analogous drawback at work and we’re hiring proficient engineers to resolve this type of issues.
Take care.

Picture by Stefan Cosma on Unsplash

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments