It’s been some time since jOOQ 3.15 has been launched with its revolutionary customary SQL MULTISET
emulation characteristic. A factor that has been lengthy overdue and which I promised on twitter a couple of instances is to run a couple of benchmarks evaluating the efficiency of varied approaches to nesting to-many relationships with jOOQ.
This text will present, that in some actual world situations on modest information set sizes, jOOQ’s MULTISET
emulation performs about in addition to
- manually working single be a part of queries and manually deduplicating outcomes
- manually working a number of queries per nest degree and matching leads to the consumer
In distinction, the entire above carry out significantly better than the dreaded N+1 “method” (or moderately, accident), all of the whereas being way more readable and maintainable.
The conclusion is:
- For jOOQ customers to freely use
MULTISET
each time smallish information units are used (i.e. a nested loop be a part of can be OK, too) - For jOOQ customers to make use of
MULTISET
fastidiously the place large-ish information units are used (i.e. a hash be a part of or merge be a part of can be higher, e.g. in stories) - For ORM distributors to choose the a number of queries per nest degree method in case they’re in full management of their SQL to materialise predefined object graphs
Benchmark concept
As at all times, we’re querying the well-known Sakila database. There are two varieties of queries that I’ve examined on this benchmark.
A question that double-nests youngster collections (DN = DoubleNesting)
The outcome shall be of the shape:
file DNCategory (String title) {}
file DNFilm (lengthy id, String title, Checklist<DNCategory> classes) {}
file DNName (String firstName, String lastName) {}
file DNActor (lengthy id, DNName title, Checklist<DNFilm> movies) {}
So, the outcome shall be actors and their movies and their classes per movie. If a single be a part of is being executed, this could trigger plenty of duplication within the information (though regrettably, in our take a look at information set, every movie solely has a single class)
A question that nests two youngster collections in a single father or mother (MCC = A number of Baby Collections)
The outcome shall be of the shape:
file MCCName (String firstName, String lastName) {}
file MCCActor (lengthy id, MCCName title) {}
file MCCCategory (String title) {}
file MCCFilm (
lengthy id,
String title,
Checklist<MCCActor> actors,
Checklist<MCCCategory> classes
) {}
So, the outcome shall be movies and their actors in addition to their classes. That is exhausting to deduplicate with a single be a part of, due to the cartesian product between ACTOR × CATEGORY
. However different approaches with a number of queries nonetheless work, in addition to MULTISET
, after all, which would be the most handy choice
Information set dimension
Along with the above distinction of use-cases, the benchmark will even attempt to pull in both:
- All the information set (we now have 1000
FILM
entries in addition to 200ACTOR
entries, so not an enormous information set), the place hash joins are typically higher - Solely the subset for both
ACTOR_ID = 1
orFILM_ID = 1
, respectively, the place nested loop joins are typically higher
The expectation right here is {that a} JOIN
tends to carry out higher on bigger outcome units, as a result of the RDBMS will choose a hash be a part of algorithm. It’s unlikely the MULTISET
emulation could be remodeled right into a hash be a part of or merge be a part of, on condition that it makes use of JSON_ARRAYAGG
which is perhaps troublesome to remodel into one thing completely different, which remains to be equal.
Benchmark exams
The next issues shall be benchmarked for every mixture of the above matrix:
- A single
MULTISET
question with its 3 obtainable emulations utilizingXML
(the place obtainable),JSON
,JSONB
- A single
JOIN
question that creates a cartesian product between father or mother and youngsters - An method that runs 2 queries fetching all the required information into consumer reminiscence, and performs the nesting within the consumer, thereafter
- A naive N+1 “consumer aspect nested loop be a part of,” which is horrible however not unlikely to occur in actual world consumer code, both with jOOQ (much less seemingly, however nonetheless potential), or with a lazy loading ORM (extra seemingly, as a result of “unintended”)
The complete benchmark logic shall be posted on the finish of this text.
1. Single MULTISET question (DN)
The question appears like this:
return state.ctx.choose(
ACTOR.ACTOR_ID,
// Nested file for the actor title
row(
ACTOR.FIRST_NAME,
ACTOR.LAST_NAME
).mapping(DNName::new),
// First degree nested assortment for movies per actor
multiset(
choose(
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
// Second degree nested assortment for classes per movie
multiset(
choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(FILM_ACTOR.FILM_ID))
).convertFrom(r -> r.map(mapping(DNCategory::new)))
)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.ACTOR_ID.eq(ACTOR.ACTOR_ID))
).convertFrom(r -> r.map(mapping(DNFilm::new))))
.from(ACTOR)
// Both fetch all information or filter ACTOR_ID = 1
.the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch(mapping(DNActor::new));
To study extra about the particular MULTISET
syntax and the ad-hoc conversion characteristic, please seek advice from earlier weblog posts explaining the small print. The identical is true for the implicit JOIN
characteristic, which I’ll be utilizing on this submit to maintain SQL a bit extra terse.
2. Single JOIN question (DN)
We will do all the things with a single be a part of as properly. On this instance, I’m utilizing a purposeful type to remodel the flat outcome into the doubly nested assortment in a sort secure means. It’s a bit quirky, maybe there are higher methods to do that with non-JDK APIs. Since I wouldn’t count on this to be efficiency related, I feel it’s ok:
// The question is easy. Simply be a part of all the things from
// ACTOR -> FILM -> CATEGORY through the connection tables
return state.ctx.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME,
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
FILM_CATEGORY.class().NAME)
.from(FILM_ACTOR)
.be a part of(FILM_CATEGORY).on(FILM_ACTOR.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
.the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
// Now comes the difficult half. We first use JDK Collectors to group
// outcomes by ACTOR
.gather(groupingBy(
r -> new DNActor(
r.value1(),
new DNName(r.value2(), r.value3()),
// dummy FILM listing, we won't simply gather them right here, but
null
),
// For every actor, produce an inventory of FILM, once more with a dummy
// CATEGORY listing as we won't gather them right here but
groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
))
// Set<Entry<DNActor, Map<DNFilm, Checklist<Record6<...>>>>>
.entrySet()
.stream()
// Re-map the DNActor file into itself, however this time, add the
// nested DNFilm listing.
.map(a -> new DNActor(
a.getKey().id(),
a.getKey().title(),
a.getValue()
.entrySet()
.stream()
.map(f -> new DNFilm(
f.getKey().id(),
f.getKey().title(),
f.getValue().stream().map(
c -> new DNCategory(c.value6())
).toList()
))
.toList()
))
.toList();
Presumably, this instance might be improved to keep away from the dummy assortment placeholders within the first gather()
name, though that will most likely require extra file sorts or structural tuple sorts like these from jOOλ. I saved it “easy” for this instance, although I’ll take your recommendations within the feedback.
3. Two queries merged in reminiscence (DN)
A wonderfully wonderful answer is to run a number of queries (however not N+1
queries!), i.e. one question per degree of nesting. This isn’t at all times potential, or optimum, however on this case, there’s an affordable answer.
I’m spelling out the prolonged Record5<...>
sorts to point out the precise sorts on this weblog submit. You should utilize var
to revenue from kind inference, after all. All of those queries use File.value5()
and comparable accessors to revenue from index primarily based entry, simply to be truthful, stopping the sector lookup, which isn’t essential within the benchmark.
// Easy question to get ACTORs and their FILMs
Consequence<Record5<Lengthy, String, String, Lengthy, String>> actorAndFilms =
state.ctx
.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME,
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE)
.from(FILM_ACTOR)
.the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch();
// Create a FILM.FILM_ID => CATEGORY.NAME lookup
// That is simply fetching all of the movies and their classes.
// Optionally, filter for the earlier FILM_ID listing
Map<Lengthy, Checklist<DNCategory>> categoriesPerFilm = state.ctx
.choose(
FILM_CATEGORY.FILM_ID,
FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(state.filter
? FILM_CATEGORY.FILM_ID.in(actorAndFilms.map(r -> r.value4()))
: noCondition())
.gather(intoGroups(
r -> r.value1(),
r -> new DNCategory(r.value2())
));
// Group once more by ACTOR and FILM, utilizing the earlier dummy
// assortment trick
return actorAndFilms
.gather(groupingBy(
r -> new DNActor(
r.value1(),
new DNName(r.value2(), r.value3()),
null
),
groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
))
.entrySet()
.stream()
// Then change the dummy collections
.map(a -> new DNActor(
a.getKey().id(),
a.getKey().title(),
a.getValue()
.entrySet()
.stream()
.map(f -> new DNFilm(
f.getKey().id(),
f.getKey().title(),
// And use the CATEGORY per FILM lookup
categoriesPerFilm.get(f.getKey().id())
))
.toList()
))
.toList();
Whew. Unwieldy. Definitely, the MULTISET
method is to be most well-liked from a readability perspective? All this mapping to middleman structural information sorts could be heavy, particularly if you happen to make a typo and the compiler journeys.
4. N+1 queries (DN)
This naive answer is hopefully not what you’re doing principally in manufacturing, however we’ve all accomplished it sooner or later (sure, responsible!), so right here it’s. At the very least, the logic is extra readable than the earlier ones, it’s as easy as the unique MULTISET
instance, actually, as a result of it does nearly the identical factor because the MULTISET
instance, however as a substitute of doing all the things in SQL, it correlates the subqueries within the consumer:
// Fetch all ACTORs
return state.ctx
.choose(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
.from(ACTOR)
.the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch(a -> new DNActor(
a.value1(),
new DNName(a.value2(), a.value3()),
// And for every ACTOR, fetch all FILMs
state.ctx
.choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.ACTOR_ID.eq(a.value1()))
.fetch(f -> new DNFilm(
f.value1(),
f.value2(),
// And for every FILM, fetch all CATEGORY-s
state.ctx
.choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
.fetch(r -> new DNCategory(r.value1()))
))
));
1. Single MULTISET question (MCC)
Now, we’ll repeat the train once more to gather information right into a extra tree like information construction, the place the father or mother kind has a number of youngster collections, one thing that’s way more troublesome to do with JOIN
queries. Piece of cake with MULTISET
, which nests the collections instantly in SQL:
return state.ctx
.choose(
FILM.FILM_ID,
FILM.TITLE,
// Get all ACTORs for every FILM
multiset(
choose(
FILM_ACTOR.ACTOR_ID,
row(
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME
).mapping(MCCName::new)
)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
).convertFrom(r -> r.map(mapping(MCCActor::new))),
// Get all CATEGORY-s for every FILM
multiset(
choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
).convertFrom(r -> r.map(mapping(MCCCategory::new))))
.from(FILM)
.the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
.fetch(mapping(MCCFilm::new));
Once more, to study extra about the particular MULTISET
syntax and the ad-hoc conversion characteristic, please seek advice from earlier weblog posts explaining the small print. The identical is true for the implicit JOIN
characteristic, which I’ll be utilizing on this submit to maintain SQL a bit extra terse.
2. Single JOIN Question (MCC)
One of these nesting may be very exhausting to do with a single JOIN
question, as a result of there shall be a cartesian product between ACTOR
and CATEGORY
, which can be exhausting to deduplicate after the actual fact. On this case, it could be potential, as a result of we all know that every ACTOR
is listed solely as soon as per FILM
, and so is every CATEGORY
. However what if this wasn’t the case? It won’t be potential to appropriately take away duplicates, as a result of we wouldn’t be capable of distinguish:
- Duplicates originating from the
JOIN
cartesian product - Duplicates originating from the underlying information set
As it’s exhausting (most likely not inconceivable) to ensure correctness, it’s futile to check efficiency right here.
3. Two queries merged in reminiscence (MCC)
That is once more fairly an affordable implementation of this type of nesting utilizing odd JOIN
queries.
It’s most likely what most ORMs do that don’t but assist MULTISET
like assortment nesting. It’s completely affordable to make use of this method when the ORM is in full management of the generated queries (e.g. when fetching pre-defined object graphs). However when permitting customized queries, this method received’t work properly for complicated queries. For instance, JPQL’s JOIN FETCH
syntax could use this method behind the scenes, however this prevents JPQL from supporting non-trivial queries the place JOIN FETCH
is utilized in derived tables or correlated subqueries, and itself joins derived tables, and so on. Appropriate me if I’m incorrect, however I feel that appears to be extremely exhausting to get proper, to remodel complicated nested queries into a number of particular person queries which can be executed one after the opposite, solely to then reassemble outcomes.
In any case, it’s an method that works properly for ORMs who’re in command of their SQL, however is laborious for finish customers to implement manually.
// Easy question to get ACTORs and their FILMs
Consequence<Record5<Lengthy, String, Lengthy, String, String>> filmsAndActors =
state.ctx
.choose(
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME)
.from(FILM_ACTOR)
.the place(state.filter ? FILM_ACTOR.FILM_ID.eq(1L) : noCondition())
.fetch();
// Create a FILM.FILM_ID => CATEGORY.NAME lookup
// That is simply fetching all of the movies and their classes.
Map<Lengthy, Checklist<MCCCategory>> categoriesPerFilm = state.ctx
.choose(
FILM_CATEGORY.FILM_ID,
FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.in(
filmsAndActors.map(r -> r.value1())
))
.and(state.filter ? FILM_CATEGORY.FILM_ID.eq(1L) : noCondition())
.gather(intoGroups(
r -> r.value1(),
r -> new MCCCategory(r.value2())
));
// Group once more by ACTOR and FILM, utilizing the earlier dummy
// assortment trick
return filmsAndActors
.gather(groupingBy(
r -> new MCCFilm(r.value1(), r.value2(), null, null),
groupingBy(r -> new MCCActor(
r.value3(),
new MCCName(r.value4(), r.value5())
))
))
.entrySet()
.stream()
// This time, the nesting of CATEGORY-s is less complicated as a result of
// we do not have to nest them once more deeply
.map(f -> new MCCFilm(
f.getKey().id(),
f.getKey().title(),
new ArrayList<>(f.getValue().keySet()),
categoriesPerFilm.get(f.getKey().id())
))
.toList();
As you cam see, it nonetheless appears like a chore to do all of this grouping and nesting manually, ensuring all of the intermediate structural sorts are right, however at the least the MCC
case is a bit easier than the earlier DN
case as a result of the nesting is much less deep.
However everyone knows, we’ll finally mix the approaches and nest tree constructions of arbitrary complexity.
4. N+1 queries (MCC)
Once more, don’t do that at house (or in manufacturing), however we’ve all been there and right here’s what plenty of functions do both explicitly (disgrace on the writer!), or implicitly (disgrace on the ORM for permitting the writer to place disgrace on the writer!)
// Fetch all FILMs
return state.ctx
.choose(FILM.FILM_ID, FILM.TITLE)
.from(FILM)
.the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
.fetch(f -> new MCCFilm(
f.value1(),
f.value2(),
// For every FILM, fetch all ACTORs
state.ctx
.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.FILM_ID.eq(f.value1()))
.fetch(a -> new MCCActor(
a.value1(),
new MCCName(a.value2(), a.value3())
)),
// For every FILM, fetch additionally all CATEGORY-s
state.ctx
.choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
.fetch(c -> new MCCCategory(c.value1()))
));
Algorithmic complexities
Earlier than we transfer on to the benchmark outcomes, please be very cautious along with your interpretation, as at all times.
The purpose of this benchmark wasn’t to discover a clear winner (or put disgrace on a clear loser). The purpose of this benchmark was to verify if the MULTISET
method has any important and apparent profit and/or downside over the opposite, extra guide and unwieldy approaches.
Don’t conclude that if one thing is 1.5x or 3x sooner than one thing else, that it’s higher. It could be on this case, however it is probably not in numerous instances, e.g.
- When the info set is smaller
- When the info set is greater
- When the info set is distributed otherwise (e.g. many extra classes per movie, or a much less common variety of movies per actor (sakila information units had been generated moderately uniformly))
- When switching distributors
- When switching variations
- When you could have extra load on the system
- When your queries are extra various (benchmarks are inclined to run solely single queries, which tremendously revenue from caching within the database server!)
So, once more, as with each benchmark outcome, be very cautious along with your interpretation.
The N+1 case
Even the N+1 case, which might flip into one thing horrible isn’t at all times the incorrect selection.
As we all know from Huge O Notation, issues with dangerous algorithmic complexities seem solely when N
is large, not when it’s small.
- The algorithmic complexity of a single nested assortment is
O(N * log M)
, i.e.N
instances wanting up values in an index forM
values (assuming there’s an index) - The algorithmic complexity of a doubly nested assortment, nonetheless, is way worse, it’s
O(N * log M * ? * log L)
, i.e.N
instances wanting up values in an index forM
values, after which?
instances (relies on the distribution) wanting up values in an index forL
values.
Higher hope all of those values are very small. If they’re, you’re wonderful. In the event that they aren’t, you’ll discover in manufacturing on a weekend.
The MULTISET case
Whereas I preserve advocating MULTISET
because the holy grail as a result of it’s so highly effective, handy, kind secure, and fairly performant, as we’ll see subsequent, it isn’t the holy grail like all the things else we ever hoped for promising holy-graily-ness.
Whereas it is perhaps theoretically potential to implement some hash be a part of type nested assortment algorithm within the MULTISET
case, I think that the emulations, which at the moment use XMLAGG
, JSON_ARRAYAGG
or comparable constructs, received’t be optimised this manner, and as such, we’ll get correlated subqueries, which is basically N+1, however 100% on the server aspect.
As an increasing number of individuals use SQL/JSON options, these is perhaps optimised additional sooner or later, although. I wouldn’t maintain my breath for RDBMS distributors investing time to enhance SQL/XML an excessive amount of (regrettably).
We will confirm the execution plan by working an EXPLAIN
(on PostgreSQL) on the question generated by jOOQ for the doubly nested assortment case:
clarify choose
actor.actor_id,
row (actor.first_name, actor.last_name),
(
choose coalesce(
json_agg(json_build_array(v0, v1, v2)),
json_build_array()
)
from (
choose
film_actor.film_id as v0,
alias_75379701.title as v1,
(
choose coalesce(
json_agg(json_build_array(v0)),
json_build_array()
)
from (
choose alias_130639425.title as v0
from (
film_category
be a part of class as alias_130639425
on film_category.category_id =
alias_130639425.category_id
)
the place film_category.film_id = film_actor.film_id
) as t
) as v2
from (
film_actor
be a part of movie as alias_75379701
on film_actor.film_id = alias_75379701.film_id
)
the place film_actor.actor_id = actor.actor_id
) as t
)
from actor
the place actor.actor_id = 1
The result’s:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Seq Scan on actor (value=0.00..335.91 rows=1 width=72) Filter: (actor_id = 1) SubPlan 2 -> Mixture (value=331.40..331.41 rows=1 width=32) -> Hash Be part of (value=5.09..73.73 rows=27 width=23) Hash Cond: (alias_75379701.film_id = film_actor.film_id) -> Seq Scan on movie alias_75379701 (value=0.00..66.00 rows=1000 width=23) -> Hash (value=4.75..4.75 rows=27 width=8) -> Index Solely Scan utilizing film_actor_pkey on film_actor (value=0.28..4.75 rows=27 width=8) Index Cond: (actor_id = actor.actor_id) SubPlan 1 -> Mixture (value=9.53..9.54 rows=1 width=32) -> Hash Be part of (value=8.30..9.52 rows=1 width=7) Hash Cond: (alias_130639425.category_id = film_category.category_id) -> Seq Scan on class alias_130639425 (value=0.00..1.16 rows=16 width=15) -> Hash (value=8.29..8.29 rows=1 width=8) -> Index Solely Scan utilizing film_category_pkey on film_category (value=0.28..8.29 rows=1 width=8) Index Cond: (film_id = film_actor.film_id)
As anticipated, two nested scalar subqueries. Don’t get side-tracked by the hash joins within the subqueries. These are anticipated, as a result of we’re becoming a member of between e.g. FILM
and FILM_ACTOR
, or between CATEGORY
and FILM_CATEGORY
within the subquery. However this doesn’t have an effect on how the 2 subqueries are correlated to the outer-most question, the place we can’t use any hash joins.
So, we now have an N+1 scenario, simply with out the latency of working a server roundtrip each time! The algorithmic complexity is similar, however the fixed overhead per merchandise has been eliminated, permitting for larger N
earlier than it hurts – nonetheless the method will fail finally, identical to having too many JOIN
on giant information units is inefficient in RDBMS that don’t assist hash be a part of or merge be a part of, however solely nested loop be a part of (e.g. older MySQL variations).
Future variations of jOOQ could assist MULTISET
extra natively on Oracle and PostgreSQL. It’s already supported natively in Informix, which has customary SQL MULTISET
assist. In PostgreSQL, issues might be accomplished utilizing ARRAY(<subquery>)
and ARRAY_AGG()
, which is perhaps extra clear to the optimiser than JSON_AGG
. Whether it is, I’ll undoubtedly observe up with one other weblog submit.
The one JOIN question case
I’d count on this method to work OK-ish, if the nested collections aren’t too large (i.e. there isn’t an excessive amount of duplicate information). As soon as the nested collections develop larger, the deduplication will bear fairly some prices as:
- Extra redundant information must be produced on the server aspect (requiring extra reminiscence and CPU)
- Extra redundant information must be transferred over the wire
- Extra deduplication must be accomplished on the consumer aspect (requiring extra reminiscence and CPU)
All in all, this method appears foolish for complicated nesting, however doable for a single nested assortment. This benchmark doesn’t take a look at enormous deduplications.
The 1-query-per-nest-level case
I’d count on the very unwieldy 1-query-per-nest-level case to be probably the most performant as N
scales. It’s additionally comparatively easy for an ORM to implement, in case the ORM is in full management of the generated SQL and doesn’t should respect any consumer question necessities. It received’t work properly if blended into consumer question syntax, and it’s exhausting to do for customers manually each time.
Nonetheless, it’s an “after-the-fact” assortment nesting method, that means that it solely works properly if some assumptions in regards to the authentic question could be maintained. E.g. JOIN FETCH
in JPQL solely takes you this far. It could have been an OK workaround to nesting collections and making the idea obtainable for easy instances, however I’m optimistic JPA / JPQL will evolve and likewise undertake MULTISET
primarily based approaches. In spite of everything, MULTISET
has been a SQL customary for ORDBMS for ages now.
The long-term answer for nesting collections can solely be to nest them instantly in SQL, and make all of the logic obtainable to the optimiser for its varied selections.
Benchmark outcomes
Lastly, some outcomes! I’ve run the benchmark on these 4 RDBMS:
- MySQL
- Oracle
- PostgreSQL
- SQL Server
I didn’t run it on Db2, which might’t correlate derived tables but, an important characteristic for correlated MULTISET
subqueries in jOOQ in 3.15 – 3.17’s MULTISET
emulation (see https://github.com/jOOQ/jOOQ/points/12045 for particulars).
As at all times, since benchmark outcomes can’t be printed for industrial RDBMS, I’m not publishing precise instances, solely relative instances, the place the slowest execution is 1, and sooner executions are multiples of 1. Think about some precise unit of measurement, like operations/second, solely it’s not per second however per undisclosed unit of time. That means, the RDBMS can solely be in contrast with themselves, not with one another.
MySQL:
Benchmark (filter) Mode Cnt Rating Error Items MultisetVsJoinBenchmark.doubleNestingJoin true thrpt 7 4413.48 ± 448.63 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON true thrpt 7 2524.96 ± 402.38 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB true thrpt 7 2738.62 ± 332.37 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries true thrpt 7 265.37 ± 42.98 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries true thrpt 7 2256.38 ± 363.18 ops/time-unit MultisetVsJoinBenchmark.doubleNestingJoin false thrpt 7 266.27 ± 13.31 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON false thrpt 7 54.98 ± 2.25 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB false thrpt 7 54.05 ± 1.58 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries false thrpt 7 1.00 ± 0.11 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries false thrpt 7 306.23 ± 11.64 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON true thrpt 7 3058.68 ± 722.24 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB true thrpt 7 3179.18 ± 333.77 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries true thrpt 7 1845.75 ± 167.26 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries true thrpt 7 2425.76 ± 579.73 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON false thrpt 7 91.78 ± 2.65 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB false thrpt 7 92.48 ± 2.25 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries false thrpt 7 2.84 ± 0.48 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries false thrpt 7 171.66 ± 19.8 ops/time-unit
Oracle:
Benchmark (filter) Mode Cnt Rating Error Items MultisetVsJoinBenchmark.doubleNestingJoin true thrpt 7 669.54 ± 28.35 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON true thrpt 7 419.13 ± 23.60 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB true thrpt 7 432.40 ± 17.76 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML true thrpt 7 351.42 ± 18.70 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries true thrpt 7 251.73 ± 30.19 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries true thrpt 7 548.80 ± 117.40 ops/time-unit MultisetVsJoinBenchmark.doubleNestingJoin false thrpt 7 15.59 ± 1.86 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON false thrpt 7 2.41 ± 0.07 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB false thrpt 7 2.40 ± 0.07 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML false thrpt 7 1.91 ± 0.06 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries false thrpt 7 1.00 ± 0.12 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries false thrpt 7 13.63 ± 1.57 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON true thrpt 7 1217.79 ± 89.87 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB true thrpt 7 1214.07 ± 76.10 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML true thrpt 7 702.11 ± 87.37 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries true thrpt 7 919.47 ± 340.63 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries true thrpt 7 1194.05 ± 179.92 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON false thrpt 7 2.89 ± 0.08 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB false thrpt 7 3.00 ± 0.05 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML false thrpt 7 1.04 ± 0.17 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries false thrpt 7 1.52 ± 0.08 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries false thrpt 7 13.00 ± 1.96 ops/time-unit
PostgreSQL:
Benchmark (filter) Mode Cnt Rating Error Items MultisetVsJoinBenchmark.doubleNestingJoin true thrpt 7 4128.21 ± 398.82 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON true thrpt 7 3187.88 ± 409.30 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB true thrpt 7 3064.69 ± 154.75 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML true thrpt 7 1973.44 ± 166.22 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries true thrpt 7 267.15 ± 34.01 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries true thrpt 7 2081.03 ± 317.95 ops/time-unit MultisetVsJoinBenchmark.doubleNestingJoin false thrpt 7 275.95 ± 6.80 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON false thrpt 7 53.94 ± 1.06 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB false thrpt 7 45.00 ± 0.52 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML false thrpt 7 25.11 ± 1.01 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries false thrpt 7 1.00 ± 0.07 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries false thrpt 7 306.11 ± 35.40 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON true thrpt 7 4710.47 ± 194.37 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB true thrpt 7 4391.78 ± 223.62 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML true thrpt 7 2740.73 ± 186.70 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries true thrpt 7 1792.94 ± 134.50 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries true thrpt 7 2821.82 ± 252.34 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON false thrpt 7 68.45 ± 2.58 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB false thrpt 7 58.59 ± 0.58 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML false thrpt 7 15.58 ± 0.35 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries false thrpt 7 2.71 ± 0.16 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries false thrpt 7 163.03 ± 7.54 ops/time-unit
SQL Server:
Benchmark (filter) Mode Cnt Rating Error Items MultisetVsJoinBenchmark.doubleNestingJoin true thrpt 7 4081.85 ± 1029.84 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON true thrpt 7 1243.17 ± 84.24 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB true thrpt 7 1254.13 ± 56.94 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML true thrpt 7 1077.23 ± 61.50 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries true thrpt 7 264.45 ± 16.12 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries true thrpt 7 1608.92 ± 145.75 ops/time-unit MultisetVsJoinBenchmark.doubleNestingJoin false thrpt 7 359.08 ± 20.88 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSON false thrpt 7 8.41 ± 0.06 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetJSONB false thrpt 7 8.32 ± 0.15 ops/time-unit MultisetVsJoinBenchmark.doubleNestingMultisetXML false thrpt 7 7.24 ± 0.08 ops/time-unit MultisetVsJoinBenchmark.doubleNestingNPlusOneQueries false thrpt 7 1.00 ± 0.09 ops/time-unit MultisetVsJoinBenchmark.doubleNestingTwoQueries false thrpt 7 376.02 ± 7.60 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON true thrpt 7 1735.23 ± 178.30 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB true thrpt 7 1736.01 ± 92.26 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML true thrpt 7 1339.68 ± 137.47 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries true thrpt 7 1264.50 ± 343.56 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries true thrpt 7 1057.54 ± 130.13 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSON false thrpt 7 7.90 ± 0.05 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetJSONB false thrpt 7 7.85 ± 0.15 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsMultisetXML false thrpt 7 5.06 ± 0.18 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsNPlusOneQueries false thrpt 7 1.77 ± 0.28 ops/time-unit MultisetVsJoinBenchmark.multipleChildCollectionsTwoQueries false thrpt 7 255.08 ± 19.22 ops/time-unit
Benchmark conclusion
As could be seen in most RDBMS:
- All RDBMS produced comparable outcomes.
- The added latency of N+1 nearly at all times contributed a big efficiency penalty. An exception is the place we now have a
filter = true
and a number of youngster collections, in case of which the father or motherN
is 1 (duh) and solely a single nest degree is carried out. MULTISET
carried out even higher than the one questionJOIN
primarily based method or the 1-query-per-nest-level method whenfilter = true
and with a number of youngster collections, most likely due to the extra compact information format.- The
XML
primarily basedMULTISET
emulation is at all times the slowest among the many emulations, seemingly as a result of it requires extra formatting. (In a single Oracle case, theXML
primarily basedMULTISET
emulation was even slower than the odd N+1 method). JSONB
is a bit slower in PostgreSQL thanJSON
, seemingly as a result ofJSON
is a purely textual content primarily based format, with none submit processing / cleanup.JSONB
‘s benefit is just not with projection-only queries, however with storage, comparability, and different operations. For many usages,JSONB
might be higher. For projection solely,JSON
is best (jOOQ 3.17 will make this the default for theMULTISET
emulation)- It’s value noting that jOOQ serialises information as JSON arrays, not JSON objects, to be able to keep away from transferring repetitive column names, and supply positional index when deserialising the array.
- For big-ish information units (the place
filter = false
), the N+1 issue of aMULTISET
correlated subquery can develop into an issue (as a result of nature of algorithmic complexity), because it prevents utilizing extra environment friendly hash joins. In these instances, the one questionJOIN
primarily based method or 1-query-per-nest-level method are higher
In brief:
MULTISET
can be utilized each time a nested loop be a part of is perfect.- If a hash be a part of or merge be a part of can be extra optimum, then the one question
JOIN
method or the 1-query-per-nest-level method are inclined to carry out higher (although they’ve their very own caveats as complexity grows)
The profit in comfort and correctness is unquestionably value it for small information units. For bigger information units, proceed utilizing JOIN
. As at all times, there’s no silver bullet.
Issues this weblog submit didn’t examine
A couple of issues weren’t investigated by this weblog submit, together with:
- The serialisation overhead within the server. Bizarre JDBC
ResultSet
are inclined to revenue from a binary community protocol between server and consumer. WithJSON
orXML
, that advantage of protocol compactness goes away, and a scientific overhead is produced. To what extent this performs a task has not been investigated. - The identical is true on the consumer aspect, the place nested
JSON
orXML
paperwork have to be deserialised. Whereas the next VisualVM screenshot exhibits that there’s some overhead, it’s not important in comparison with the execution time. Additionally, it isn’t a big quantity extra overhead than what jOOQ already produces when mapping betweenResultSet
and jOOQ information constructions. I imply, clearly, utilizing JDBC instantly could be sooner if you happen to’re doing it proper, however then you definitely take away all of the comfort jOOQ creates.
Benchmark code
Lastly, do you have to want to reproduce this benchmark, or adapt it to your individual wants, right here’s the code.
I’ve used JMH for the benchmark. Whereas that is clearly not a “micro benchmark,” I like JMH’s method to benchmarking, together with:
- Simple configuration
- Warmup penalty is eliminated by doing warmup iterations
- Statistics are collected to deal with outlier results
Clearly, all of the model use jOOQ for question constructing, execution, mapping, to attain truthful and significant outcomes. It might be potential to make use of JDBC instantly within the non-MULTISET
approaches, however that wouldn’t be a good comparability of ideas.
The benchmark assumes availability of a SAKILA
database occasion, in addition to generated code, just like this jOOQ demo.
package deal org.jooq.take a look at.benchmarks.native;
import static java.util.stream.Collectors.groupingBy;
import static org.jooq.Data.intoGroups;
import static org.jooq.Data.mapping;
import static org.jooq.instance.db.postgres.Tables.*;
import static org.jooq.impl.DSL.multiset;
import static org.jooq.impl.DSL.noCondition;
import static org.jooq.impl.DSL.row;
import static org.jooq.impl.DSL.choose;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.ArrayList;
import java.util.Checklist;
import java.util.Map;
import java.util.Properties;
import java.util.operate.Client;
import org.jooq.DSLContext;
import org.jooq.Record5;
import org.jooq.Consequence;
import org.jooq.conf.NestedCollectionEmulation;
import org.jooq.conf.Settings;
import org.jooq.impl.DSL;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Stage;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;
@Fork(worth = 1)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 7, time = 3)
public class MultisetVsJoinBenchmark {
@State(Scope.Benchmark)
public static class BenchmarkState {
Connection connection;
DSLContext ctx;
@Param({ "true", "false" })
boolean filter;
@Setup(Stage.Trial)
public void setup() throws Exception {
attempt (InputStream is = BenchmarkState.class.getResourceAsStream("/config.mysql.properties")) {
Properties p = new Properties();
p.load(is);
Class.forName(p.getProperty("db.mysql.driver"));
connection = DriverManager.getConnection(
p.getProperty("db.mysql.url"),
p.getProperty("db.mysql.username"),
p.getProperty("db.mysql.password")
);
}
ctx = DSL.utilizing(connection, new Settings()
.withExecuteLogging(false)
.withRenderSchema(false));
}
@TearDown(Stage.Trial)
public void teardown() throws Exception {
connection.shut();
}
}
file DNName(String firstName, String lastName) {}
file DNCategory(String title) {}
file DNFilm(lengthy id, String title, Checklist<DNCategory> classes) {}
file DNActor(lengthy id, DNName title, Checklist<DNFilm> movies) {}
@Benchmark
public Checklist<DNActor> doubleNestingMultisetXML(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.XML);
return doubleNestingMultiset0(state);
}
@Benchmark
public Checklist<DNActor> doubleNestingMultisetJSON(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSON);
return doubleNestingMultiset0(state);
}
@Benchmark
public Checklist<DNActor> doubleNestingMultisetJSONB(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSONB);
return doubleNestingMultiset0(state);
}
non-public Checklist<DNActor> doubleNestingMultiset0(BenchmarkState state) {
return state.ctx
.choose(
ACTOR.ACTOR_ID,
row(
ACTOR.FIRST_NAME,
ACTOR.LAST_NAME
).mapping(DNName::new),
multiset(
choose(
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
multiset(
choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(FILM_ACTOR.FILM_ID))
).convertFrom(r -> r.map(mapping(DNCategory::new)))
)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.ACTOR_ID.eq(ACTOR.ACTOR_ID))
).convertFrom(r -> r.map(mapping(DNFilm::new))))
.from(ACTOR)
.the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch(mapping(DNActor::new));
}
@Benchmark
public Checklist<DNActor> doubleNestingJoin(BenchmarkState state) {
return state.ctx
.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME,
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
FILM_CATEGORY.class().NAME)
.from(FILM_ACTOR)
.be a part of(FILM_CATEGORY).on(FILM_ACTOR.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
.the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
.gather(groupingBy(
r -> new DNActor(r.value1(), new DNName(r.value2(), r.value3()), null),
groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
))
.entrySet()
.stream()
.map(a -> new DNActor(
a.getKey().id(),
a.getKey().title(),
a.getValue()
.entrySet()
.stream()
.map(f -> new DNFilm(
f.getKey().id(),
f.getKey().title(),
f.getValue().stream().map(c -> new DNCategory(c.value6())).toList()
))
.toList()
))
.toList();
}
@Benchmark
public Checklist<DNActor> doubleNestingTwoQueries(BenchmarkState state) {
Consequence<Record5<Lengthy, String, String, Lengthy, String>> actorAndFilms = state.ctx
.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME,
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE)
.from(FILM_ACTOR)
.the place(state.filter ? FILM_ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch();
Map<Lengthy, Checklist<DNCategory>> categoriesPerFilm = state.ctx
.choose(
FILM_CATEGORY.FILM_ID,
FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(state.filter
? FILM_CATEGORY.FILM_ID.in(actorAndFilms.map(r -> r.value4()))
: noCondition())
.gather(intoGroups(
r -> r.value1(),
r -> new DNCategory(r.value2())
));
return actorAndFilms
.gather(groupingBy(
r -> new DNActor(r.value1(), new DNName(r.value2(), r.value3()), null),
groupingBy(r -> new DNFilm(r.value4(), r.value5(), null))
))
.entrySet()
.stream()
.map(a -> new DNActor(
a.getKey().id(),
a.getKey().title(),
a.getValue()
.entrySet()
.stream()
.map(f -> new DNFilm(
f.getKey().id(),
f.getKey().title(),
categoriesPerFilm.get(f.getKey().id())
))
.toList()
))
.toList();
}
@Benchmark
public Checklist<DNActor> doubleNestingNPlusOneQueries(BenchmarkState state) {
return state.ctx
.choose(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
.from(ACTOR)
.the place(state.filter ? ACTOR.ACTOR_ID.eq(1L) : noCondition())
.fetch(a -> new DNActor(
a.value1(),
new DNName(a.value2(), a.value3()),
state.ctx
.choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.ACTOR_ID.eq(a.value1()))
.fetch(f -> new DNFilm(
f.value1(),
f.value2(),
state.ctx
.choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
.fetch(r -> new DNCategory(r.value1()))
))
));
}
file MCCName(String firstName, String lastName) {}
file MCCCategory(String title) {}
file MCCActor(lengthy id, MCCName title) {}
file MCCFilm(lengthy id, String title, Checklist<MCCActor> actors, Checklist<MCCCategory> classes) {}
@Benchmark
public Checklist<MCCFilm> multipleChildCollectionsMultisetXML(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.XML);
return multipleChildCollectionsMultiset0(state);
}
@Benchmark
public Checklist<MCCFilm> multipleChildCollectionsMultisetJSON(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSON);
return multipleChildCollectionsMultiset0(state);
}
@Benchmark
public Checklist<MCCFilm> multipleChildCollectionsMultisetJSONB(BenchmarkState state) {
state.ctx.settings().setEmulateMultiset(NestedCollectionEmulation.JSONB);
return multipleChildCollectionsMultiset0(state);
}
non-public Checklist<MCCFilm> multipleChildCollectionsMultiset0(BenchmarkState state) {
return state.ctx
.choose(
FILM.FILM_ID,
FILM.TITLE,
multiset(
choose(
FILM_ACTOR.ACTOR_ID,
row(
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME
).mapping(MCCName::new)
)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
).convertFrom(r -> r.map(mapping(MCCActor::new))),
multiset(
choose(
FILM_CATEGORY.class().NAME
)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
).convertFrom(r -> r.map(mapping(MCCCategory::new))))
.from(FILM)
.the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
.fetch(mapping(MCCFilm::new));
}
@Benchmark
public Checklist<MCCFilm> multipleChildCollectionsTwoQueries(BenchmarkState state) {
Consequence<Record5<Lengthy, String, Lengthy, String, String>> filmsAndActors = state.ctx
.choose(
FILM_ACTOR.FILM_ID,
FILM_ACTOR.movie().TITLE,
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME)
.from(FILM_ACTOR)
.the place(state.filter ? FILM_ACTOR.FILM_ID.eq(1L) : noCondition())
.fetch();
Map<Lengthy, Checklist<MCCCategory>> categoriesPerFilm = state.ctx
.choose(
FILM_CATEGORY.FILM_ID,
FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.in(
filmsAndActors.map(r -> r.value1())
))
.and(state.filter ? FILM_CATEGORY.FILM_ID.eq(1L) : noCondition())
.gather(intoGroups(
r -> r.value1(),
r -> new MCCCategory(r.value2())
));
return filmsAndActors
.gather(groupingBy(
r -> new MCCFilm(r.value1(), r.value2(), null, null),
groupingBy(r -> new MCCActor(r.value3(), new MCCName(r.value4(), r.value5())))
))
.entrySet()
.stream()
.map(f -> new MCCFilm(
f.getKey().id(),
f.getKey().title(),
new ArrayList<>(f.getValue().keySet()),
categoriesPerFilm.get(f.getKey().id())
))
.toList();
}
@Benchmark
public Checklist<MCCFilm> multipleChildCollectionsNPlusOneQueries(BenchmarkState state) {
return state.ctx
.choose(FILM.FILM_ID, FILM.TITLE)
.from(FILM)
.the place(state.filter ? FILM.FILM_ID.eq(1L) : noCondition())
.fetch(f -> new MCCFilm(
f.value1(),
f.value2(),
state.ctx
.choose(
FILM_ACTOR.ACTOR_ID,
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME)
.from(FILM_ACTOR)
.the place(FILM_ACTOR.FILM_ID.eq(f.value1()))
.fetch(a -> new MCCActor(
a.value1(),
new MCCName(a.value2(), a.value3())
)),
state.ctx
.choose(FILM_CATEGORY.class().NAME)
.from(FILM_CATEGORY)
.the place(FILM_CATEGORY.FILM_ID.eq(f.value1()))
.fetch(c -> new MCCCategory(c.value1()))
));
}
}