Am 12.08.2022 um 21:15 schrieb Rick
Otten:
On Fri, Aug 12, 2022 at 3:07 PM Nico Heller <nico.heller@xxxxxxxxx> wrote:
Am 12.08.2022 um 21:02 schrieb Rick Otten:
Yes there are indices on all referenced columns of the subselect (they are all primary keys anyway)
On Fri, Aug 12, 2022 at 2:50 PM Nico Heller <nico.heller@xxxxxxxxx> wrote:
Good day,
consider the following query:
WITH aggregation(
SELECT
a.*,
(SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
(SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
(SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
(SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;
- You do have an index on `b.a_id` and `c.a_id`, etc... ? You didn't say...
Yes, EXPLAIN ANALYZE shows a doubling of execution time - I don't have numbers on the memory usage difference though- Are you sure it is the `to_jsonb` that is making this query slow?
- Since you are serializing this for easy machine readable consumption outside of the database, does it make a difference if you use `to_json` instead?
Using to_json vs. to_jsonb makes no difference in regards to runtime, I will check if the memory consumption is different on monday - thank you for the idea!
One other thought. Does it help if you convert the arrays to json first before you convert the whole row? ie, add some to_json()'s around the bs, cs, ds, es columns in the CTE. I'm wondering if breaking the json conversions up into smaller pieces will let the outer to_json() have less work to do and overall run faster. You could even separately serialize the elements inside the array too. I wouldn't think it would make a huge difference, you'd be making a bunch of extra to_json calls, but maybe it avoids some large memory structure that would otherwise have to be constructed to serialize all of those objects in all of the arrays all at the same time.
Using jsonb_array_agg and another to_jsonb at the (its still
needed to create one value at the end and to include the columns
"a.*") worsens the query performance by 100%, I can't speak for
the memory usage because I would have to push these changes to
preproduction - will try this on monday, thanks.