Couldn’t create distinctive index: the best way to resolve duplication errors

September 23, 2022

132

Distinctive indexes implement the individuality (as their identify implies) of a number of columns’ values. They’ll assure one-to-one relationships in your mannequin. As an illustration, let’s have a singular procuring cart per person:

=# create distinctive index uidx_shopping_carts_on_user_id on shopping_carts (user_id);

ERROR:  may not create distinctive index "uidx_shopping_carts_by_user_id"
DETAIL:  Key (user_id)=(2) is duplicated.

Wow! What was that? Plainly the index cannot be created: some current rows already infringe the constraint.

Oopsy

Let’s discover the info a bit. Can we spot culprits?

=# choose user_id from shopping_carts order by user_id;

 user_id 
---------
       1
       2
       2
       3
       4
       5

Oopsy, “2” seems twice. This may be fairly a problem. It will probably go unsuitable for 3 causes:

The person can see the unsuitable cart, resulting in objects disappearing
Your knowledge group has to use heuristics to reunify the duplicates
Worst case: it results in complicated bugs as a result of everybody suppose procuring carts are distinctive

Sadly, postgres can not create a singular index so long as the duplicate objects exist.

First issues first: what number of duplicates do you could have? Some self becoming a member of request does the trick:

=# choose
-=   depend(*)
-= from shopping_carts a
-= be a part of (choose
-=   user_id,
-=   depend(*)
-= from shopping_carts
-= group by user_id
-= having depend(*) > 1) as b
-=   on b.user_id = a.user_id; 

 depend 
-------
 76533

There are too many duplicates for a guide cleanup… Create a dashboard to trace the duplicate injection fee. In case your desk already features a created_at datetime, you should use a variation of the earlier question:

=# choose
-#   last_occurrence,
-#   depend(*)
-# from shopping_carts a
-# be a part of (choose
-#   user_id,
-#   max(date(created_at)) as last_occurrence,
-#   depend(*)
-# from shopping_carts
-# group by user_id
-# having depend(*) > 1) as b
-#   on b.user_id = a.user_id
-# group by last_occurrence
-# order by last_occurrence DESC;

 last_occurrence | depend 
-----------------+-------
 2022-09-13      |   193
 2022-09-12      |   484

We’ve got the duplicate fee per day. That is helpful to verify if our future fixes work.

Take away the foundation trigger

Let’s contemplate your endpoint implements the next pseudo-ruby code:

def create
  ShoppingCarts.create!(...)
finish

A fast repair is to verify the existence of the row earlier than creating it:

def create
  if ShoppingCarts.the place(user_id: current_user.id).exists? do
    ShoppingCarts.create!(...)
  finish
finish

This manner no duplicate creation, proper? You deploy and verify the numbers the subsequent day:

 last_occurrence | depend 
-----------------+-------
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484

Wait what? Nonetheless extra duplicates!!!!

Transactions

It is a race situation taking place. Two requests verify existence and insert on the similar time.

request 1	request 2	choose depend(*) from shopping_carts the place id = 2
exists()… == false	–	0
–	exists()… == false	0
insert()…	–	1
–	insert()…	2

Transactions are gonna assist, however in a roundabout way.

def create
  ActiveRecord::Base.transaction do
    if ShoppingCarts.the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish

You deploy this code and watch its results:

 last_occurrence | depend 
-----------------+-------
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484

The issue just isn’t solved. Transactions are configured to dedicated learn isolation by default. In the event you replay the earlier race situation below this isolation, you will discover no violation. Transactions don’t keep away from race situations. Thankfully, there are instruments to cease the bleeding: locks.

Locks are normally managed by your database for operations like indexation, and so on. Locks are a typical solution to mutually exclude processes from shared assets. You may also manually lock a row. Mutual exclusion is stopping 2 processes to entry the identical useful resource.

In postgres you possibly can lock rows or tables utilizing the for udpate key phrase in a choose question:

def create
  ActiveRecord::Base.transaction do
    if ShoppingCarts.lock("FOR UPDATE").the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish

You deploy and verify the numbers the subsequent day:

 last_occurrence | depend 
-----------------+-------
 2022-09-16      |    11
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484

Oopsy, the lock doesn’t work? As a result of the choose question returns no rows, no lock is upheld. We have to lock one thing that exists. Good thing that we now have a singular person!

def create
  current_user.with_lock do
    if ShoppingCarts.the place(user_id: current_user.id).exists? do
      ShoppingCarts.create!(...)
    finish
  finish
finish

Right here the transaction is lastly useful. The lock is launched when the transaction is dedicated. You possibly can deploy and see the consequence:

 last_occurrence | depend 
-----------------+-------
 2022-09-16      |    11
 2022-09-15      |    15
 2022-09-14      |     8
 2022-09-13      |   193
 2022-09-12      |   484

No line for 2022-09-17: victory! Subsequent, we take away duplicates and create the distinctive index.

Eradicating duplicates

Eradicating duplicates is usually a sensitive matter. I discover the next heuristic true more often than not: the newest modified row is probably the most up to date. Some frameworks mechanically generate updated_at columns. This may be helpful to use the heuristic. We use updated_at column to discriminate the previous rows to delete:

choose
  b.id
from shopping_carts a
be a part of (choose
  user_id,
  max(updated_at) AS last_updated_at
from shopping_carts
group by user_id
having (depend(*) > 1)) b
  on a.user_id = b.user_id
  and a.updated_at < b.last_updated_at

This manner solely the final up to date rows stay. Let’s verify if there is no duplicate left:

=# choose
-=   depend(*)
-= from shopping_carts a
-= be a part of (choose
-=   user_id,
-=   depend(*)
-= from shopping_carts
-= group by user_id
-= having depend(*) > 1) as b
-=   on b.user_id = a.user_id; 

 depend 
-------
     0

Good!

Creating the distinctive index

With no duplicate within the desk left, we are able to create the index:

=# create distinctive index uidx_shopping_carts_on_user_id on shopping_carts (user_id);

This time, there is no error. Each person has at most one procuring cart. When making an attempt to create duplicates, you encounter the next error:

=# insert into shopping_carts(user_id) values (2), (2);

ERROR:  duplicate key worth violates distinctive constraint "uidx_shopping_carts_by_user_id"

No course of can violate the enterprise rule, even below race situations! Nice worth for the buck in the event you ask me.

Conclusion

I hope you discovered some sql data. Listed below are the primary takeaways:

Be taught some sql syntax to navigate simply in your knowledge
Columns created_at and updated_at managed by your ORM are nice instruments for upkeep
Transactions do not keep away from race situations
Introduce distinctive constraints earlier than you feed the info (this is applicable to different constraints)

I needed to resolve an analogous drawback at work and we’re hiring proficient engineers to resolve this type of issues.
Take care.

Picture by Stefan Cosma on Unsplash

Previous articleHighly effective Multilingual Date/Time Picker – Tempus Dominus

Next articleReport Mailbox and OneDrive Measurement in PowerShell through Microsoft Graph API

Couldn’t create distinctive index: the best way to resolve duplication errors

Oopsy

Take away the foundation trigger

Transactions

Eradicating duplicates

Creating the distinctive index

Conclusion

Stress-Free Web site Testing for Instagram and Fb In-App Browsers

The Life and Dying of a Rails App with Olivier Lacan

Exchange aasm with Rails Enum right this moment

LEAVE A REPLY Cancel reply

Most Popular

Protection for deamon – Getting Assist

Constructing the Patreon for builders with Birk Jernström, founder / CEO of Polar (Changelog Interviews #591)

Automate Content material Visibility with jQuery Expire Content material Plugin

Stress-Free Web site Testing for Instagram and Fb In-App Browsers

Recent Comments

ABOUT US

POPULAR POSTS

Protection for deamon – Getting Assist

Constructing the Patreon for builders with Birk Jernström, founder / CEO of Polar (Changelog Interviews #591)

Automate Content material Visibility with jQuery Expire Content material Plugin

POPULAR CATEGORY