effect on planner of turning a subquery to a table, sql function returning table

Thierry Henrio <thierry.henrio@xxxxxxxxx> · Fri, 12 Apr 2024 12:33:06 +0200

Hi there,

I work on a booking system.
Below is a query showing reservations of devices that overlaps a campaign.
A campaign has duration and time intervals by day of week.

Here is a query (A):

select
  device_id,
  t.date,
  timerange(t.start_time, t.end_time) * g.times as times
from device_timeslots t
join (
  select date, dow, timerange(start, stop) as times
  from campaigns c
  cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
  join unnest_timeslots(c.timeslots) t(dow, start, stop) on t.dow=extract(dow from date)
  where id=11870
) g on g.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;

The time intervals of the campaign are stored as a jsonb (c.timeslots) and expanded by an IMMUTABLE sql function unnest_timeslots returning a table.

The query (A) runs in 440.497 ms.

When I replace subquery with a temp table (B):

create temp table z11870 as (
  select date, dow, timerange(start, stop) as times
  from campaigns c
  cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
  join unnest_timeslots(c.timeslots) t(dow, start, stop) on t.dow=extract(dow from date)
  where id=11870
);

select
  device_id,
  t.date,
  timerange(t.start_time, t.end_time) * z.times as times
from device_timeslots t
join z11870 z on z.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;

The query (B) runs in 48.160 ms.

Here is (B) execution plan:

 GroupAggregate  (cost=70121.37..71282.14 rows=33165 width=124)
   Group Key: t.device_id, t.date, t.start_time, t.end_time, z.times
   ->  Sort  (cost=70121.37..70204.28 rows=33165 width=64)
         Sort Key: t.device_id, t.date, t.start_time, t.end_time, z.times, t.rank
         ->  Merge Join  (cost=67127.99..67631.11 rows=33165 width=64)
               Merge Cond: (z.date = t.date)
               ->  Sort  (cost=78.60..81.43 rows=1130 width=40)
                     Sort Key: z.date
                     ->  Seq Scan on z11870 z  (cost=0.00..21.30 rows=1130 width=40)
               ->  Sort  (cost=67049.39..67109.04 rows=23861 width=32)
                     Sort Key: t.date
                     ->  Bitmap Heap Scan on device_timeslots t  (cost=329.01..65314.41 rows=23861 width=32)
                           Recheck Cond: ((date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
                           ->  Bitmap Index Scan on device_timeslots_date_index  (cost=0.00..323.05 rows=23861 width=0)
                                 Index Cond: ((date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))

, whereas the plan of (A) is:

 GroupAggregate  (cost=401037.82..503755.82 rows=1467400 width=124)
   Group Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone))
   ->  Sort  (cost=401037.82..404706.32 rows=1467400 width=96)
         Sort Key: t.device_id, t.date, t.start_time, t.end_time, (timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->> 1))::time without time zone)), t.rank
         ->  Nested Loop  (cost=2.99..100268.62 rows=1467400 width=96)
               ->  Nested Loop  (cost=2.98..55962.20 rows=14674 width=64)
                     ->  Nested Loop  (cost=2.54..39.31 rows=500 width=40)
                           ->  Index Scan using campaigns_pkey on campaigns c  (cost=0.28..8.30 rows=1 width=355)
                                 Index Cond: (id = 11870)
                           ->  Hash Join  (cost=2.26..26.01 rows=500 width=40)
                                 Hash Cond: (EXTRACT(dow FROM d.date) = ((j.dow)::integer)::numeric)
                                 ->  Function Scan on generate_series d  (cost=0.01..10.01 rows=1000 width=8)
                                 ->  Hash  (cost=1.00..1.00 rows=100 width=64)
                                       ->  Function Scan on jsonb_each j  (cost=0.00..1.00 rows=100 width=64)
                     ->  Index Scan using device_timeslots_date_index on device_timeslots t  (cost=0.43..111.56 rows=29 width=32)
                           Index Cond: ((date = d.date) AND (date >= '2024-04-26'::date) AND (date <= '2024-04-26'::date))
               ->  Memoize  (cost=0.01..1.01 rows=100 width=32)
                     Cache Key: j.times
                     Cache Mode: binary
                     ->  Function Scan on jsonb_array_elements t_1  (cost=0.00..1.00 rows=100 width=32)

The Merge Join of (B) provides better timing than the Nested Loop of (A)...

On the options I think:

O1) change the design, add a table much like the z11870

O2) Is there a way to hint planner to materialize a subquery?

O3) other?

Cheers!

, Thierry