Re: effect on planner of turning a subquery to a table, sql function returning table

Thierry Henrio <thierry.henrio@xxxxxxxxx> · Fri, 12 Apr 2024 16:58:23 +0200

Thanks David!

Setting jit to 0:

 GroupAggregate  (cost=401037.82..503755.82 rows=1467400 width=124) (actual time=56.603..68.908 rows=4670 loops=1)
   Group Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone))
   Buffers: shared hit=3067
   ->  Sort  (cost=401037.82..404706.32 rows=1467400 width=96) (actual time=56.583..57.847 rows=22908 loops=1)
         Sort Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone)), t.rank
         Sort Method: quicksort  Memory: 3229kB
         Buffers: shared hit=3067
         ->  Nested Loop  (cost=2.99..100268.62 rows=1467400 width=96) (actual time=0.076..31.219 rows=22908 loops=1)
               Buffers: shared hit=3067
               ->  Nested Loop  (cost=2.98..55962.20 rows=14674 width=64) (actual time=0.055..6.589 rows=11454 loops=1)
                     Buffers: shared hit=3067
                     ->  Nested Loop  (cost=2.54..39.31 rows=500 width=40) (actual time=0.041..0.047 rows=1 loops=1)
                           Buffers: shared hit=3
                           ->  Index Scan using campaigns_pkey on campaigns c  (cost=0.28..8.30 rows=1 width=355) (actual time=0.011..0.012 rows=1 loops=1)
                                 Index Cond: (id = 11870)
                                 Buffers: shared hit=3
                           ->  Hash Join  (cost=2.26..26.01 rows=500 width=40) (actual time=0.028..0.031 rows=1 loops=1)
                                 Hash Cond: (EXTRACT(dow FROM d.date) = ((j.dow)::integer)::numeric)
                                 ->  Function Scan on generate_series d  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.007..0.008 rows=1 loops=1)
                                 ->  Hash  (cost=1.00..1.00 rows=100 width=64) (actual time=0.013..0.014 rows=7 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                       ->  Function Scan on jsonb_each j  (cost=0.00..1.00 rows=100 width=64) (actual time=0.007..0.008 rows=7 loops=1)
                     ->  Index Scan using device_timeslots_date_index on device_timeslots t  (cost=0.43..111.56 rows=29 width=32) (actual time=0.013..4.282 rows=11454 loops=1)
                           Index Cond: ((date = d.date) AND (date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
                           Buffers: shared hit=3064
               ->  Memoize  (cost=0.01..1.01 rows=100 width=32) (actual time=0.000..0.000 rows=2 loops=11454)
                     Cache Key: j.times
                     Cache Mode: binary
                     Hits: 11453  Misses: 1  Evictions: 0  Overflows: 0  Memory Usage: 1kB
                     ->  Function Scan on jsonb_array_elements t_1  (cost=0.00..1.00 rows=100 width=32) (actual time=0.002..0.002 rows=2 loops=1)
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.512 ms
 Execution Time: 69.476 ms

So as far as I understand, correct me when I'm wrong:

1) EXPLAIN (ANALYZE, BUFFERS) shows the jit timing, which mere EXPLAIN did not thank!
I found the https://www.postgresql.org/docs/current/jit-decision.html#JIT-DECISION

2) the subquery is overestimated, estimated 500, actual 1 in the Nested Loop.
And it amplified by its outer Nested Loop

         ->  Nested Loop  (cost=2.99..100268.62 rows=1467400 width=96) (actual time=0.067..27.102 rows=22908 loops=1)
               Buffers: shared hit=3067
               ->  Nested Loop  (cost=2.98..55962.20 rows=14674 width=64) (actual time=0.055..5.750 rows=11454 loops=1)
                     Buffers: shared hit=3067
                     ->  Nested Loop  (cost=2.54..39.31 rows=500 width=40) (actual time=0.039..0.044 rows=1 loops=1)

The scan of the temp table was overestimated too, but it is mitigated by the Merge Join

         ->  Merge Join  (cost=67127.99..67631.11 rows=33165 width=64) (actual time=7.642..13.455 rows=22908 loops=1)
               Merge Cond: (z.date = t.date)
               Buffers: shared hit=3064, local hit=1
               ->  Sort  (cost=78.60..81.43 rows=1130 width=40) (actual time=0.011..0.012 rows=2 loops=1)
                     Sort Key: z.date
                     Sort Method: quicksort  Memory: 25kB
                     Buffers: local hit=1
                     ->  Seq Scan on z11870 z  (cost=0.00..21.30 rows=1130 width=40) (actual time=0.006..0.007 rows=2 loops=1)
                           Buffers: local hit=1

The cost estimation is better in (B) than (A) (even when the estimated rows of z11870 are off).

So I have options : O1) set jit=0, O2) temp table, O3) rework schema design (a variation on O2).
Thanks!

On Fri, Apr 12, 2024 at 1:40 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote:
On Fri, 12 Apr 2024 at 23:27, Thierry Henrio <thierry.henrio@xxxxxxxxx> wrote:

>  JIT:

>    Functions: 36

>    Options: Inlining true, Optimization true, Expressions true, Deforming true

>    Timing: Generation 1.949 ms, Inlining 28.891 ms, Optimization 207.481 ms, Emission 134.907 ms, Total 373.228 ms

>  Execution Time: 429.037 ms

It looks very much like the majority of the extra time is being spent

doing JIT compilation.  This triggers for plan A but not plan B.  You

can see from:

> GroupAggregate  (cost=401037.82..503755.82 rows=1467400 width=124) (actual time=416.851..426.534 rows=4670 loops=1)

that the top-level row estimates are off.  This makes the estimated

cost higher than it actually is.  The planner opts to have tuple

deforming and _expression_ evaluation JIT compiled to try to speed up

the plan thinking it's worthwhile. It's not in this case.

You can switch JIT off to try without with:

SET jit=0;

You might want to consider editing postgresql.conf and raising the

jit_above_cost, jit_inline_above_cost and jit_optimize_above_cost

values to some higher value or disable JIT completely.

SELECT pg_reload_conf(); -- to reload the config file afterwards.

David