Re: to_jsonb performance on array aggregated correlated subqueries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here are the query plans (I hope my anonymization didn't break them). I ran every query a couple times before copying the plan to avoid timing issues because of disk access.
Ignore the sequential scan on one of the tables, it's very small (will change in the future) so Postgres opts for a faster sequential scan - the other sequential scan is on the IN()-statement which uses a VALUE list in the actual query (using a non-VALUE list makes no difference).
Overall the plan is quite optimal for me and performs really well considering the amount of rows it extracts and converts to json.

Notice how removing to_jsonb improves the query performance significantly (see last query plan) and how the cost is attributed to the hash join.
Using to_jsonb instead of to_jsonb or json_agg instead of jsonb_agg makes no difference in query plan or execution time.

I used random id's so I don't know how how big the result got but it shouldn't matter for the query plan:


array_agg, then to_jsonb (my initially posted query)
 
Hash Semi Join  (cost=5.00..15947.39 rows=200 width=32) (actual time=0.266..18.128 rows=200 loops=1)
"  Hash Cond: (a.id = ""*VALUES*"".column1)"
  ->  Seq Scan on a  (cost=0.00..41.02 rows=502 width=422) (actual time=0.013..0.268 rows=502 loops=1)
  ->  Hash  (cost=2.50..2.50 rows=200 width=32) (actual time=0.091..0.092 rows=200 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 21kB
"        ->  Values Scan on ""*VALUES*""  (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
  SubPlan 1
    ->  Aggregate  (cost=42.20..42.21 rows=1 width=32) (actual time=0.020..0.020 rows=1 loops=200)
          ->  Bitmap Heap Scan on b  (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.017 rows=12 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                ->  Bitmap Index Scan on fk_b_idx  (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 2
    ->  Aggregate  (cost=27.68..27.69 rows=1 width=32) (actual time=0.012..0.012 rows=1 loops=200)
          ->  Bitmap Heap Scan on c  (cost=4.35..27.66 rows=9 width=98) (actual time=0.009..0.010 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                ->  Bitmap Index Scan on fk_c_idx  (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 3
    ->  Aggregate  (cost=8.30..8.31 rows=1 width=32) (actual time=0.009..0.010 rows=1 loops=200)
          ->  Index Scan using fk_d_idx on d  (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
  SubPlan 4
    ->  Aggregate  (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          ->  Seq Scan on e  (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.520 ms
Execution Time: 18.650 ms
 
jsonb_agg instead of array_agg, then to_jsonb
 
Hash Semi Join  (cost=5.00..15947.39 rows=200 width=32) (actual time=0.338..23.921 rows=200 loops=1)
"  Hash Cond: (a.id = ""*VALUES*"".column1)"
  ->  Seq Scan on a  (cost=0.00..41.02 rows=502 width=422) (actual time=0.012..0.244 rows=502 loops=1)
  ->  Hash  (cost=2.50..2.50 rows=200 width=32) (actual time=0.090..0.091 rows=200 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 21kB
"        ->  Values Scan on ""*VALUES*""  (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
  SubPlan 1
    ->  Aggregate  (cost=42.20..42.21 rows=1 width=32) (actual time=0.050..0.050 rows=1 loops=200)
          ->  Bitmap Heap Scan on b  (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.018 rows=12 loops=200)
               Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                ->  Bitmap Index Scan on fk_b_idx  (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 2
    ->  Aggregate  (cost=27.68..27.69 rows=1 width=32) (actual time=0.028..0.028 rows=1 loops=200)
          ->  Bitmap Heap Scan on c  (cost=4.35..27.66 rows=9 width=98) (actual time=0.009..0.011 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                ->  Bitmap Index Scan on fk_c_idx  (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 3
    ->  Aggregate  (cost=8.30..8.31 rows=1 width=32) (actual time=0.014..0.014 rows=1 loops=200)
          ->  Index Scan using fk_d_idx on d  (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
  SubPlan 4
    ->  Aggregate  (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          ->  Seq Scan on e  (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.513 ms
Execution Time: 24.020 ms
 
array_agg without to_jsonb at the end
 
Hash Semi Join  (cost=5.00..15946.89 rows=200 width=550) (actual time=0.209..9.784 rows=200 loops=1)
"  Hash Cond: (a.id = ""*VALUES*"".column1)"
  ->  Seq Scan on a  (cost=0.00..41.02 rows=502 width=422) (actual time=0.013..0.190 rows=502 loops=1)
  ->  Hash  (cost=2.50..2.50 rows=200 width=32) (actual time=0.079..0.080 rows=200 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 21kB
"        ->  Values Scan on ""*VALUES*""  (cost=0.00..2.50 rows=200 width=32) (actual time=0.001..0.040 rows=200 loops=1)"
  SubPlan 1
    ->  Aggregate  (cost=42.20..42.21 rows=1 width=32) (actual time=0.019..0.019 rows=1 loops=200)
          ->  Bitmap Heap Scan on b  (cost=4.38..42.17 rows=12 width=156) (actual time=0.012..0.017 rows=12 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=382
                ->  Bitmap Index Scan on fk_b_idx  (cost=0.00..4.37 rows=12 width=0) (actual time=0.008..0.008 rows=14 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 2
    ->  Aggregate  (cost=27.68..27.69 rows=1 width=32) (actual time=0.012..0.012 rows=1 loops=200)
          ->  Bitmap Heap Scan on c  (cost=4.35..27.66 rows=9 width=98) (actual time=0.008..0.010 rows=5 loops=200)
                Recheck Cond: (a_id = a.id)
                Heap Blocks: exact=169
                ->  Bitmap Index Scan on fk_c_idx  (cost=0.00..4.35 rows=9 width=0) (actual time=0.007..0.007 rows=5 loops=200)
                      Index Cond: (a_id = a.id)
  SubPlan 3
    ->  Aggregate  (cost=8.30..8.31 rows=1 width=32) (actual time=0.009..0.009 rows=1 loops=200)
          ->  Index Scan using fk_d_idx on d  (cost=0.28..8.29 rows=1 width=81) (actual time=0.008..0.008 rows=1 loops=200)
                Index Cond: (a_id = a.id)
  SubPlan 4
    ->  Aggregate  (cost=1.27..1.28 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=200)
          ->  Seq Scan on e  (cost=0.00..1.26 rows=1 width=76) (actual time=0.004..0.004 rows=0 loops=200)
                Filter: (a_id = a.id)
                Rows Removed by Filter: 21
Planning Time: 0.496 ms
Execution Time: 9.892 ms
 


Am 12.08.2022 um 21:15 schrieb Andres Freund:
Hi,

On 2022-08-12 18:49:58 +0000, Nico Heller wrote:
WITH aggregation(
    SELECT
           a.*,
          (SELECT array_agg(b.*) FROM b WHERE b.a_id = a.id) as "bs",
          (SELECT array_agg(c.*) FROM c WHERE c.a_id = a.id) as "cs",
          (SELECT array_agg(d.*) FROM d WHERE d.a_id = a.id) as "ds",
          (SELECT array_agg(e.*) FROM d WHERE e.a_id = a.id) as "es"
    FROM a WHERE a.id IN (<some big list, ranging from 20-180 entries)
)
SELECT to_jsonb(aggregation.*) as "value" FROM aggregation;

      
Imagine that for each "a" there exists between 5-100 "b", "c", "d" and "e"
which makes the result of this pretty big (worst case: around 300kb when
saved to a text file).
I noticed that adding the "to_jsonb" increases the query time by 100%, from
9-10ms to 17-23ms on average.
Could we see the explain?

Have you tried using json[b]_agg()?


This may not seem slow at all but this query has another issue: on an AWS
Aurora Serverless V2 instance we are running into a RAM usage of around
30-50 GB compared to < 10 GB when using a simple LEFT JOINed query when
under high load (> 1000 queries / sec). Furthermore the CPU usage is quite
high.
We can't say much about aurora. It's a heavily modified fork of postgres.  Did
you reproduce this with vanilla postgres? And if so, do you have it in a form
that somebody could try out?

Greetings,

Andres Freund

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux