Thanks David!
Setting jit to 0:GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124) (actual time=56.603..68.908 rows=4670 loops=1)
Group Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone))
Buffers: shared hit=3067
-> Sort (cost=401037.82..404706.32 rows=1467400 width=96) (actual time=56.583..57.847 rows=22908 loops=1)
Sort Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone)), t.rank
Sort Method: quicksort Memory: 3229kB
Buffers: shared hit=3067
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96) (actual time=0.076..31.219 rows=22908 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64) (actual time=0.055..6.589 rows=11454 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.54..39.31 rows=500 width=40) (actual time=0.041..0.047 rows=1 loops=1)
Buffers: shared hit=3
-> Index Scan using campaigns_pkey on campaigns c (cost=0.28..8.30 rows=1 width=355) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: (id = 11870)
Buffers: shared hit=3
-> Hash Join (cost=2.26..26.01 rows=500 width=40) (actual time=0.028..0.031 rows=1 loops=1)
Hash Cond: (EXTRACT(dow FROM d.date) = ((j.dow)::integer)::numeric)
-> Function Scan on generate_series d (cost=0.01..10.01 rows=1000 width=8) (actual time=0.007..0.008 rows=1 loops=1)
-> Hash (cost=1.00..1.00 rows=100 width=64) (actual time=0.013..0.014 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Function Scan on jsonb_each j (cost=0.00..1.00 rows=100 width=64) (actual time=0.007..0.008 rows=7 loops=1)
-> Index Scan using device_timeslots_date_index on device_timeslots t (cost=0.43..111.56 rows=29 width=32) (actual time=0.013..4.282 rows=11454 loops=1)
Index Cond: ((date = d.date) AND (date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
Buffers: shared hit=3064
-> Memoize (cost=0.01..1.01 rows=100 width=32) (actual time=0.000..0.000 rows=2 loops=11454)
Cache Key: j.times
Cache Mode: binary
Hits: 11453 Misses: 1 Evictions: 0 Overflows: 0 Memory Usage: 1kB
-> Function Scan on jsonb_array_elements t_1 (cost=0.00..1.00 rows=100 width=32) (actual time=0.002..0.002 rows=2 loops=1)
Planning:
Buffers: shared hit=4
Planning Time: 0.512 ms
Execution Time: 69.476 ms
I found the https://www.postgresql.org/docs/current/jit-decision.html#JIT-DECISION
And it amplified by its outer Nested Loop
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96) (actual time=0.067..27.102 rows=22908 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64) (actual time=0.055..5.750 rows=11454 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.54..39.31 rows=500 width=40) (actual time=0.039..0.044 rows=1 loops=1)
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96) (actual time=0.067..27.102 rows=22908 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64) (actual time=0.055..5.750 rows=11454 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.54..39.31 rows=500 width=40) (actual time=0.039..0.044 rows=1 loops=1)
The scan of the temp table was overestimated too, but it is mitigated by the Merge Join
-> Merge Join (cost=67127.99..67631.11 rows=33165 width=64) (actual time=7.642..13.455 rows=22908 loops=1)
Merge Cond: (z.date = t.date)
Buffers: shared hit=3064, local hit=1
-> Sort (cost=78.60..81.43 rows=1130 width=40) (actual time=0.011..0.012 rows=2 loops=1)
Sort Key: z.date
Sort Method: quicksort Memory: 25kB
Buffers: local hit=1
-> Seq Scan on z11870 z (cost=0.00..21.30 rows=1130 width=40) (actual time=0.006..0.007 rows=2 loops=1)
Buffers: local hit=1
-> Merge Join (cost=67127.99..67631.11 rows=33165 width=64) (actual time=7.642..13.455 rows=22908 loops=1)
Merge Cond: (z.date = t.date)
Buffers: shared hit=3064, local hit=1
-> Sort (cost=78.60..81.43 rows=1130 width=40) (actual time=0.011..0.012 rows=2 loops=1)
Sort Key: z.date
Sort Method: quicksort Memory: 25kB
Buffers: local hit=1
-> Seq Scan on z11870 z (cost=0.00..21.30 rows=1130 width=40) (actual time=0.006..0.007 rows=2 loops=1)
Buffers: local hit=1
The cost estimation is better in (B) than (A) (even when the estimated rows of z11870 are off).
So I have options : O1) set jit=0, O2) temp table, O3) rework schema design (a variation on O2).
Thanks!
On Fri, Apr 12, 2024 at 1:40 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote:
On Fri, 12 Apr 2024 at 23:27, Thierry Henrio <thierry.henrio@xxxxxxxxx> wrote:
> JIT:
> Functions: 36
> Options: Inlining true, Optimization true, Expressions true, Deforming true
> Timing: Generation 1.949 ms, Inlining 28.891 ms, Optimization 207.481 ms, Emission 134.907 ms, Total 373.228 ms
> Execution Time: 429.037 ms
It looks very much like the majority of the extra time is being spent
doing JIT compilation. This triggers for plan A but not plan B. You
can see from:
> GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124) (actual time=416.851..426.534 rows=4670 loops=1)
that the top-level row estimates are off. This makes the estimated
cost higher than it actually is. The planner opts to have tuple
deforming and _expression_ evaluation JIT compiled to try to speed up
the plan thinking it's worthwhile. It's not in this case.
You can switch JIT off to try without with:
SET jit=0;
You might want to consider editing postgresql.conf and raising the
jit_above_cost, jit_inline_above_cost and jit_optimize_above_cost
values to some higher value or disable JIT completely.
SELECT pg_reload_conf(); -- to reload the config file afterwards.
David