Saturday, May 24, 2025
HomeProgrammingA SQL MERGE assertion performs actions based mostly on a RIGHT JOIN

A SQL MERGE assertion performs actions based mostly on a RIGHT JOIN


RIGHT JOIN is an esoteric characteristic within the SQL language, and infrequently seen in the true world, as a result of nearly each RIGHT JOIN can simply be expressed as an equal LEFT JOIN. The next two statements are equal:

-- In style
SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN cost AS p ON c.customer_id = p.customer_id

-- Esoteric
SELECT c.first_name, c.last_name, p.quantity
FROM cost AS p
RIGHT JOIN buyer AS c ON c.customer_id = p.customer_id

It’s not unreasonable to anticipate these two statements to provide the identical execution plan on most RDBMS, on condition that they’re logically equal. Since we’ve grown used to studying issues from left to proper and prime to backside, I don’t suppose RIGHT JOIN will develop into extra in style any time quickly.

There’s, nevertheless, one place within the SQL language the place RIGHT JOIN is surprisingly ubiquitous!

The MERGE assertion

Why is it a shock? As a result of that place doesn’t use the identical syntax for becoming a member of two tables. However that’s precisely what occurs within the MERGE assertion. Let’s have a look at the next MERGE assertion that takes:

  • A staging desk (SOURCE desk) to load information from
  • An extraordinary desk (TARGET desk) to retailer information into

With a schema like this:

CREATE TABLE book_to_book_store (
  book_id BIGINT NOT NULL REFERENCES guide,
  title TEXT NOT NULL REFERENCES book_store,
  inventory INT NOT NULL,

  PRIMARY KEY (book_id, title)
);

CREATE TABLE book_to_book_store_staging AS 
SELECT * FROM book_to_book_store
WITH NO DATA;

A question that may very well be typical of an ETL job:

-- The goal desk
MERGE INTO book_to_book_store AS t

-- The supply desk
USING book_to_book_store_staging AS s

-- The RIGHT JOIN predicate
ON t.book_id = s.book_id AND t.title = s.title

-- The actions for every row, based mostly on RIGHT JOIN matching
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory);

That is merely taking all of the rows from the BOOK_TO_BOOK_STORE_STAGING desk, and merges them into BOOK_TO_BOOK_STORE:

  • If the row already exists (there’s a MATCH), then the STOCK is up to date
  • If the row doesn’t exist already (there’s no MATCH), then the row is inserted

However we don’t use this supply -> goal syntactic order, we first specify the goal desk BOOK_TO_BOOK_STORE, after which we RIGHT JOIN the BOOK_TO_BOOK_STORE_STAGING desk to it. Give it some thought this manner:

SELECT *
FROM book_to_book_store AS t
RIGHT JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title

And, if we consider a RIGHT JOIN not as a Venn diagram, however as a cartesian product as follows, then it may be seen simply what is finished per MATCH or non-MATCH:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |2 |Paper Path |3 |0 | <-- MATCHED

As at all times with a RIGHT JOIN, each row from proper facet of the be a part of is matched with an identical row from the left facet of the be a part of, or an empty row of NULL values, if there’s no such match. After this MERGE, we wish the ensuing information to be up to date as follows:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED
|Paper Path |3 |0 |Paper Path |3 |0 | <-- MATCHED

That is how the MERGE assertion works.

Word, I mentioned earlier than that the JOIN is producing a cartesian product. In contrast to with SELECT statements, nevertheless, there’s a limitation to MERGE the place the cartesian product should not produce any duplicate matches per TARGET row, because the order of actions wouldn’t be outlined if there have been a number of SOURCE rows per TARGET row.

Deleting rows

MERGE is extra highly effective than simply performing INSERT and UPDATE. It might probably additionally DELETE rows. Let’s assume that we wish a staging desk’s STOCK = 0 to imply that the row needs to be deleted, as a substitute of the STOCK being set to 0. Then we will write:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title
WHEN MATCHED AND s.inventory = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory);

Now, with the above staging information, we’ll take away the final row as a substitute of updating it:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED : INSERT
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Paper Path |1 |1 |Paper Path |1 |1 | <-- NOT MATCHED : INSERT
| | | |Paper Path |3 |0 | <-- MATCHED : DELETE

The RIGHT JOIN semantics continues to be the identical, simply the motion is totally different now, relying on the further AND clause of the WHEN MATCHED clause.

Matching by supply

Some RDBMS help an much more highly effective vendor particular variant of MERGE, which needs to be added to the IEC/ISO 9075 normal, for my part. The BY TARGET / BY SOURCE clause. Let’s take a look on the following assertion:

MERGE INTO book_to_book_store AS t
USING book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title
WHEN MATCHED THEN UPDATE SET inventory = s.inventory
WHEN NOT MATCHED BY TARGET THEN INSERT (book_id, title, inventory) 
VALUES (s.book_id, s.title, s.inventory)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Including a WHEN NOT MATCHED BY SOURCE clause has the easy impact of turning the RIGHT JOIN operation right into a FULL JOIN operation. Consider it this manner:

SELECT *
FROM book_to_book_store AS t
FULL JOIN book_to_book_store_staging AS s
ON t.book_id = s.book_id AND t.title = s.title

Now, the end result may look one thing like this:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
| | | |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET
|Faraway Land|2 |10 |Faraway Land|2 |12 | <-- MATCHED
|Faraway Land|3 |10 |Faraway Land|3 |5 | <-- MATCHED
| | | |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET
|Paper Path |3 |2 | | | | <-- NOT MATCHED BY SOURCE

The phrases NOT MATCHED BY TARGET and NOT MATCHED BY SOURCE are fairly self-explanatory when visualised as above, and doubtless much less complicated to newcomers than LEFT JOIN and RIGHT JOIN. I wouldn’t thoughts SQL syntax to be enhanced in a approach that it might be doable to establish whether or not a NULL worth originating from an OUTER JOIN is because of:

  • The supply information containing the NULL worth
  • The row being NOT MATCHED by the “different facet” of the OUTER JOIN

Think about a hypothetical syntax like this:

SELECT c.first_name, c.last_name, p.quantity
FROM buyer AS c
LEFT JOIN cost AS p ON c.customer_id = p.customer_id
WHERE p IS NOT MATCHED BY JOIN -- Successfully an ANTI JOIN

Anyway…

When deleting rows, this method is rather more handy than having to depend on an interpretation of the semantics of knowledge, akin to STOCK = 0 that means a deletion. We now have absent rows within the SOURCE desk (the staging) desk, which merely imply the row should be deleted, if that’s how we need to mannequin issues. So, after operating the above MERGE assertion, we’ll get this final result once more:

|t.title      |t.book_id|t.inventory|s.title      |s.book_id|s.inventory|
|------------|---------|-------|------------|---------|-------|
|Faraway Land|1 |9 |Faraway Land|1 |9 | <-- NOT MATCHED BY TARGET : INSERT
|Faraway Land|2 |12 |Faraway Land|2 |12 | <-- MATCHED : UPDATE
|Faraway Land|3 |5 |Faraway Land|3 |5 | <-- MATCHED : UPDATE
|Faraway Land|1 |1 |Paper Path |1 |1 | <-- NOT MATCHED BY TARGET : INSERT
| | | | | | | <-- NOT MATCHED BY SOURCE : DELETE

At the very least the next RDBMS help the BY SOURCE and BY TARGET clauses:

  • Databricks
  • Firebird 5
  • PostgreSQL 17
  • SQL Server

Given how helpful that is, I’ll anticipate extra RDBMS to undertake this T-SQL syntax, quickly. jOOQ 3.20 has added help for it, and a future model of jOOQ could emulate it by shifting the FULL JOIN into the USING clause.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments