Awkward Join between generate_series and long table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all--

I'm having a performance problem in 12.16 that I'm hoping someone can help with.

I have a table shaped roughly like this:

                                   Table "public.data"

     Column      |            Type             | Collation | Nullable |                    Default
-----------------+-----------------------------+-----------+----------+------------------------------------------------
 id              | integer                     |           | not null | nextval('data_seq'::regclass)
 timestamp       | timestamp without time zone |           | not null |
 sn              | character varying(36)       |           | not null |

Indexes:
    "data_pkey" PRIMARY KEY, btree (id)
    "data_multicol_sn_and_timestamp_desc_idx" btree (sn, "timestamp" DESC)


with something in the 10M to 100M range in terms of number of rows.


I have a query more or less like:


WITH periods as (
    SELECT
        *
    FROM
        (
            SELECT
                s at time zone 'utc'  AS period_start,
                LEAD(s) OVER (
                    ORDER BY
                        s
                )   at time zone 'utc' AS period_end
            FROM
                generate_series(
                    ('2023-01-01T00:00:00-08:00') :: timestamptz,
                    ('2023-11-01T00:00:00-07:00') :: timestamptz,
                    ('1 day') :: interval
                ) s
        ) with_junk_period
    WHERE
        period_end IS NOT NULL  
)
SELECT
    p.period_start,
    p.period_end,
    COUNT (distinct d.id)
FROM
    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'
GROUP BY
    p.period_start,
    p.period_end
ORDER BY
    p.period_start ASC
;



This worked fine on a smaller, but same-shaped, set of data on a staging database, where the query plan was:

GroupAggregate  (cost=14843021.48..15549022.41 rows=200 width=24) (actual time=1311.052..1515.344 rows=303 loops=1)
  Group Key: with_junk_period.period_start, with_junk_period.period_end
  ->  Sort  (cost=14843021.48..15019521.21 rows=70599893 width=20) (actual time=1305.969..1384.676 rows=463375 loops=1)
        Sort Key: with_junk_period.period_start, with_junk_period.period_end
        Sort Method: external merge  Disk: 13632kB
        ->  Nested Loop Left Join  (cost=60.26..2329833.01 rows=70599893 width=20) (actual time=355.379..917.049 rows=463375 loops=1)
              ->  Subquery Scan on with_junk_period  (cost=59.83..92.33 rows=995 width=16) (actual time=355.307..358.978 rows=303 loops=1)
                    Filter: (with_junk_period.period_end IS NOT NULL)
                    Rows Removed by Filter: 1
                    ->  WindowAgg  (cost=59.83..82.33 rows=1000 width=24) (actual time=355.302..358.723 rows=304 loops=1)
                          ->  Sort  (cost=59.83..62.33 rows=1000 width=8) (actual time=355.265..355.510 rows=304 loops=1)
                                Sort Key: s.s
                                Sort Method: quicksort  Memory: 39kB
                                ->  Function Scan on generate_series s  (cost=0.00..10.00 rows=1000 width=8) (actual time=355.175..355.215 rows=304 loops=1)
              ->  Index Scan using data_multicol_sn_and_timestamp_desc_idx on data d  (cost=0.43..1631.90 rows=70955 width=12) (actual time=0.042..1.516 rows=1529 loops=303)
"                    Index Cond: (((sn)::text = 'BLAH'::text) AND (""timestamp"" >= with_junk_period.period_start) AND (""timestamp"" < with_junk_period.period_end))"
Planning Time: 0.283 ms
JIT:
  Functions: 20
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.570 ms, Inlining 29.051 ms, Optimization 192.084 ms, Emission 134.022 ms, Total 356.727 ms
Execution Time: 1523.983 ms


But on the production database, the query is no longer runnable for the same frame, and when we restrict, it we can see that index is no longer being fully utilized:

GroupAggregate  (cost=56901238.84..58446602.98 rows=200 width=24) (actual time=3652.892..3653.669 rows=4 loops=1)
  Group Key: with_junk_period.period_start, with_junk_period.period_end
  ->  Sort  (cost=56901238.84..57287579.38 rows=154536214 width=20) (actual time=3652.544..3652.765 rows=5740 loops=1)
        Sort Key: with_junk_period.period_start, with_junk_period.period_end
        Sort Method: quicksort  Memory: 641kB
        ->  Nested Loop Left Join  (cost=60.40..32259766.02 rows=154536214 width=20) (actual time=172.908..3651.658 rows=5740 loops=1)
"              Join Filter: ((d.""timestamp"" >= with_junk_period.period_start) AND (d.""timestamp"" < with_junk_period.period_end))"
              Rows Removed by Join Filter: 5310656
              ->  Subquery Scan on with_junk_period  (cost=59.83..92.33 rows=995 width=16) (actual time=152.963..153.014 rows=4 loops=1)
                    Filter: (with_junk_period.period_end IS NOT NULL)
                    Rows Removed by Filter: 1
                    ->  WindowAgg  (cost=59.83..82.33 rows=1000 width=24) (actual time=152.958..153.004 rows=5 loops=1)
                          ->  Sort  (cost=59.83..62.33 rows=1000 width=8) (actual time=152.937..152.945 rows=5 loops=1)
                                Sort Key: s.s
                                Sort Method: quicksort  Memory: 25kB
                                ->  Function Scan on generate_series s  (cost=0.00..10.00 rows=1000 width=8) (actual time=152.928..152.930 rows=5 loops=1)
              ->  Materialize  (cost=0.57..1138670.54 rows=1397815 width=12) (actual time=0.017..811.790 rows=1329099 loops=4)
                    ->  Index Scan using mdata_multicol_sn_and_timestamp_desc_idx on data d  (cost=0.57..1124855.46 rows=1397815 width=12) (actual time=0.051..2577.248 rows=1329099 loops=1)
"                          Index Cond: ((sn)::text = 'BLAH'::text)"
Planning Time: 0.184 ms
JIT:
  Functions: 22
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.429 ms, Inlining 11.254 ms, Optimization 75.746 ms, Emission 65.959 ms, Total 154.388 ms
Execution Time: 3662.526 ms


Instead, the time filtering has been moved up to the join condition, which means that the database needs to look at many, many more rows.

One optimization I was able to make was to manually add the starts and end dates of the generate_series as an overall filter whole for the right side:

    periods p
    LEFT JOIN data d
    ON
        d.timestamp >= (p.period_start)
        AND d."timestamp" < (p.period_end)
        AND d.sn = 'BLAH'
        AND d.timestamp >= ('2023-01-01T00:00:00-08:00'::timestamptz) at time zone 'utc'
        AND d.timestamp <  ('2023-11-01T00:00:00-08:00'::timestamptz) at time zone 'utc'


This improves the query's performance somewhat, because the last two conditions make it into the index usage, but the time-related join conditions remain as part of the join filter, so the upside is limited. It also seems like this is a kind of "hint" that the planner should really be able to infer from the structure of "periods".

Stepping back a bit, I think there should be a way to write this query so that the planner recognizes that each row of data should only fit into one periods bucket, and to utilize the fact that the table are both easily sortable to perform a more efficient join (merge?) than the nested loop. But I'd definitely settle for getting it to use the index like it does on the staging data.

A few things I've tried to no avail:
- moving around the non-joiny parts of the join to where, or to a CTE of data
- ordering each of the components descending in various ways to match data's index
- starting with data as the left table and periods on the right (I'd have to do some post-processing here, but if it was fast it would be worth it)

The only thing I can think of left to do is to run 12 separate queries--one for each month, which is a time period that blows up the nested loop just enough that the query can finish--and then concatenate them after the fact. But it seems like that's just using some knowledge about the structure of the tables that I should be able to communicate to the planner, and I'd really love to keep it all in the database!

Thanks for any and all help and suggestions.



--
Lincoln Swaine-Moore

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux