Hi there,
I work on a booking system.
Below is a query showing reservations of devices that overlaps a campaign.
A campaign has duration and time intervals by day of week.
Here is a query (A):
select
device_id,
t.date,
timerange(t.start_time, t.end_time) * g.times as times
from device_timeslots t
join (
select date, dow, timerange(start, stop) as times
from campaigns c
cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
join unnest_timeslots(c.timeslots) t(dow, start, stop) on t.dow=extract(dow from date)
where id=11870
) g on g.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;
The time intervals of the campaign are stored as a jsonb (c.timeslots) and expanded by an IMMUTABLE sql function unnest_timeslots returning a table.
The query (A) runs in 440.497 ms.
When I replace subquery with a temp table (B):
create temp table z11870 as (
select date, dow, timerange(start, stop) as times
from campaigns c
cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
join unnest_timeslots(c.timeslots) t(dow, start, stop) on t.dow=extract(dow from date)
where id=11870
);
select
device_id,
t.date,
timerange(t.start_time, t.end_time) * z.times as times
from device_timeslots t
join z11870 z on z.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;
The query (B) runs in 48.160 ms.
Here is (B) execution plan:
GroupAggregate (cost=70121.37..71282.14 rows=33165 width=124)
Group Key: t.device_id, t.date, t.start_time, t.end_time, z.times
-> Sort (cost=70121.37..70204.28 rows=33165 width=64)
Sort Key: t.device_id, t.date, t.start_time, t.end_time, z.times, t.rank
-> Merge Join (cost=67127.99..67631.11 rows=33165 width=64)
Merge Cond: (z.date = t.date)
-> Sort (cost=78.60..81.43 rows=1130 width=40)
Sort Key: z.date
-> Seq Scan on z11870 z (cost=0.00..21.30 rows=1130 width=40)
-> Sort (cost=67049.39..67109.04 rows=23861 width=32)
Sort Key: t.date
-> Bitmap Heap Scan on device_timeslots t (cost=329.01..65314.41 rows=23861 width=32)
Recheck Cond: ((date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
-> Bitmap Index Scan on device_timeslots_date_index (cost=0.00..323.05 rows=23861 width=0)
Index Cond: ((date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
, whereas the plan of (A) is:
GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124)
Group Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone))
-> Sort (cost=401037.82..404706.32 rows=1467400 width=96)
Sort Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone)), t.rank
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96)
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64)
-> Nested Loop (cost=2.54..39.31 rows=500 width=40)
-> Index Scan using campaigns_pkey on campaigns c (cost=0.28..8.30 rows=1 width=355)
Index Cond: (id = 11870)
-> Hash Join (cost=2.26..26.01 rows=500 width=40)
Hash Cond: (EXTRACT(dow FROM d.date) = ((j.dow)::integer)::numeric)
-> Function Scan on generate_series d (cost=0.01..10.01 rows=1000 width=8)
-> Hash (cost=1.00..1.00 rows=100 width=64)
-> Function Scan on jsonb_each j (cost=0.00..1.00 rows=100 width=64)
-> Index Scan using device_timeslots_date_index on device_timeslots t (cost=0.43..111.56 rows=29 width=32)
Index Cond: ((date = d.date) AND (date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
-> Memoize (cost=0.01..1.01 rows=100 width=32)
Cache Key: j.times
Cache Mode: binary
-> Function Scan on jsonb_array_elements t_1 (cost=0.00..1.00 rows=100 width=32)
The Merge Join of (B) provides better timing than the Nested Loop of (A)...
On the options I think:
O1) change the design, add a table much like the z11870
O2) Is there a way to hint planner to materialize a subquery?
O3) other?
Cheers!
, Thierry