Re: to_jsonb performance on array aggregated correlated subqueries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I knew I forgot something: We are currently on 13.6. When was this issue fixed?

Am 12.08.2022 um 20:56 schrieb Justin Pryzby:
What version of postgres ?

I wonder if you're hitting the known memory leak involving jit.
Try with jit=off or jit_inline_above_cost=-1.
Good day,



consider the following query:



WITH aggregation(

    SELECT

           a.*,

          (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",

          (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",

          (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",

          (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"

    FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)

)

SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;



Imagine that for each "a" there exists between 5-100 "b", "c", "d" and 
"e" which makes the result of this pretty big (worst case: around 300kb 
when saved to a text file).

I noticed that adding the "to_jsonb" increases the query time by 100%, 
from 9-10ms to 17-23ms on average.

This may not seem slow at all but this query has another issue: on an 
AWS Aurora Serverless V2 instance we are running into a RAM usage of 
around 30-50 GB compared to < 10 GB when using a simple LEFT JOINed 
query when under high load (> 1000 queries / sec). Furthermore the CPU 
usage is quite high.



Is there anything I could improve? I am open for other solutions but I 
am wondering if I ran into an edge case of "to_jsonb" for "anonymous 
records" (these are just rows without a defined UDT) - this is just a 
wild guess though.

I am mostly looking to decrease the load (CPU and memory) on Postgres 
itself. Furthermore I would like to know why the memory usage is so 
significant. Any tips on how to analyze this issue are appreciated as 
well -  my knowledge is limited to being average at interpreting EXPLAIN 
ANALYZE results.



Here's a succinct list of the why's, what I have found out so far and 
solution I already tried/ don't want to consider:



- LEFT JOINing potentially creates a huge resultset because of the 
cartesian product, thats a nono

- not using "to_jsonb" is sadly also not possible as Postgres' array + 
record syntax is very unfriendly and hard to parse (it's barely 
documented if at all and the quoting rules are cumbersome, furthermore I 
lack column names in the array which would make the parsing sensitive to 
future table changes and thus cumbersome to maintain) in my application

- I know I could solve this with a separate query for a,b,c,d and e 
while "joinining" the result in my application, but I am looking for 
another way to do this (bear with me, treat this as an academic question :))

- I am using "to_jsonb" to simply map the result to my data model via a 
json mapper

- EXPLAIN ANALYZE is not showing anything special when using "to_jsonb" 
vs. not using it, the outermost (hash) join just takes more time - is 
there a more granular EXPLAIN that shows me the runtime of functions 
like "to_jsonb"?

- I tried an approach where b,c,d,e where array columns of UDTs: UDTs 
are not well supported by my application stack (JDBC) and are generally 
undesireable for me (because of a lack of migration possibilities)

- I don't want to duplicate my data into another table (e.g. that has 
jsonb columns)

- MATERIALIZED VIEWS are also undesirable as the manual update, its 
update is non-incremental which would make a refresh on a big data set 
take a long time

- split the query into chunks to reduce the IN()-statement list size 
makes no measurable difference

- I don't want to use JSONB columns for b,c,d and e because future 
changes of b,c,d or e's structure (e.g. new fields, changing a datatype) 
are harder to achieve with JSONB and it lacks constraint checks on 
insert (e.g. not null on column b.xy)



Kind regards and thank you for your time,

Nico Heller



P.S: Sorry for the long list of "I don't want to do this", some of them 
are not possible because of other requirements










[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux