Re: Window partial fetch optimization

David Rowley <dgrowleyml@xxxxxxxxx> · Wed, 4 May 2022 10:12:51 +1200

On Wed, 4 May 2022 at 06:11, Levi Aul <levi@xxxxxxxxxxx> wrote:
> It is our expectation that this query “should” be able to be cheap-to-compute and effectively instantaneous. (It’s clear to us how we would make it so, given a simple LMDB-like sorted key-value store: prefix-match on holder_address; take the first row you find for the contract-address you’re on; build a comparator key of (holder_address, contract_address, highest-possible-version) and traverse to find the lowest row that sorts greater than it; repeat.)
>
> Which might, in SQL, be expressed as something like this:
>
> WITH ranked_holder_balances AS (
>     SELECT
>         *,
>         row_number() OVER w AS balance_rank
>     FROM contract_balance_updates
>     WHERE holder_address = '\x0000000000000000000000000000000000000000'::bytea
>     WINDOW w AS (
>         PARTITION BY holder_address, contract_address
>         ORDER BY start_block_height DESC
>     )
>     ORDER BY holder_address ASC, contract_address ASC, start_block_height DESC
> )
> SELECT *
> FROM ranked_holder_balances
> WHERE balance_rank = 1

Yes, PostgreSQL 14 is not very smart about realising that WHERE
balance_rank = 1 is only going to match the first row of each window
partition. PostgreSQL 15 (coming later this year) should be better in
this regard as some work was done to teach the query planner about
monotonic window functions [1].  However, that change likely does not
do all you'd like here as the WindowAgg node still must consume and
throw away all tuples until it finds the first tuple belonging to the
next window partition.  It sounds like you really want "Skip Scans" or
"Loose Index Scans" which are implemented by some other RDBMS'.  I
imagine that even with the change to PostgreSQL 15 that it still
wouldn't be as fast as your DISTINCT ON example.

> WITH bup1 AS (
>     SELECT DISTINCT bup.holder_address, bup.contract_address
>     FROM contract_balance_updates bup
>     WHERE bup.holder_address = '\xe03c23519e18d64f144d2800e30e81b0065c48b5'::bytea
>     ORDER BY bup.contract_address ASC
> )
> SELECT
>   bup1.holder_address,
>   bup1.contract_address,
>   (
>     SELECT balance
>     FROM contract_balance_updates bup2
>     WHERE bup2.holder_address = bup1.holder_address
>     AND bup2.contract_address = bup1.contract_address
>     ORDER BY bup2.holder_address ASC, bup2.contract_address ASC, bup2.start_block_height DESC
>     LIMIT 1
>   ) AS balance
> FROM bup1

> I really don’t like this last approach; it scans twice, it’s surprising / confusing for people maintaining the query, etc. I believe that, due to the correlated subquery, the planning overhead is also O(N) with the number of matched entities increases (though I don’t have a good test-case for this.)

No, the subquery is not replanned each time it is rescanned. It's
planned once and that same plan will be executed each time. So no O(n)
planning overhead.

> Is there any way to get PG to do what this last query is doing, purely using window-functions / distinct on / etc.? Because, if there is, I can’t find it.
>
> It seems that PG can in fact do index-range-seeking (since that’s what it’s doing when gathering the distinct contract_addresses in the last query.) It seems intuitive to me that it should be using such an approach to filter for rows in window/group-partitions, when a criteria+index that can be combined to limit the size of the window/group are available to the planner. And that, even when not able to be automatically inferred, it would make sense for there to be control over such behaviour in SQL, using hypothetical syntax like:

Unfortunately, DISTINCT can only be implemented with Hash Aggregate or
Sort / Index Scan + Unique.  We don't have anything currently which
will jump to the next highest key in an index.  There has been some
work on what we're starting to call "Skip Scans", but it's all still
work in progress.

You might find something useful in [2] which might help speed up your
DISTINCT query.

> -- for windows
> row_number() OVER (PARTITION BY x ORDER BY x LIMIT 10 OFFSET 3)
>
> -- for groups
> GROUP BY x, y, z (APPLYING LIMIT 20 OFFSET 5 PER GROUP)
>
>
> Does this make sense? Or is this something PG is already doing, and I just haven’t found the right magic words / built my index correctly to unlock it? (I notice that the last example is an index-only scan; would I get this behaviour from the previous two queries if I made the index a covering index such that those could be index-only scans as well?)

Unfortunately, there is no magic words here. PostgreSQL 14 simply has
no ability to know that row_number() is monotonically increasing,
therefore has no ability to skip any processing for rows that are
never needed.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd1aea8e9131d8f4edb21bf1687e40782
[2] https://wiki.postgresql.org/wiki/Loose_indexscan