Thanks for the insights!
On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote:
On Wed, Feb 26, 2025 at 9:29 AM <large.goose2829@xxxxxxxxxxxxx> wrote:> Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan.There is a fundamental trade-off involved here. The simple, fast"WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatevertuples are stored immediately after "(10, 20, 29)" in the index.Naturally, they're returned in index order, which is usually the mostuseful order (simple ASC order or simple DESC order for all columns).The B-Tree code can physically traverse your mixed-ASC-and-DESC orderindex in almost the same way. But it is much less useful, since thematching index tuples won't be physically located together as exactlyone contiguous group of tuples.
I am not sure I understand this.
My understanding is that given this "mixed order" index:
CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3);
The index tuples are physically organized exactly in this way:
ORDER BY col_1, col_2 DESC, col_3
So that I should be able to write a query that reads a continuous range from this index without filtering.
And so (with your "handwritten" rowcomparison) you get a filter qual that filters out non-matching tuplesusing lower-order index columns. The index scan actually just returns"Index Cond: (col_1 >= 10)" (which still returns a contiguous group ofindex tuples), while a filter condition excludes those tuples returnedby the index scan node that don't satisfy the later/lower-order columncondition.
Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitation of the current implementation of the query planner?
I don't know whether it's polite to bring up competitors on this mailing list, but MySQL 8 (which quite ironically has poor performance for the "row values" syntax, doing a full index scan) seems to avoid index filtering using the "OR AND" variant (at least when no mixed ordering is involved):
SELECT content
FROM data
WHERE
col_1 > 10
OR (
col_1 = 10
AND (
col_2 > 20
OR (
col_2 = 20
AND col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;
-> Limit: 100 row(s) (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 loops=1)
-> Index range scan on data using data_index
over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 < col_2) OR (10 < col_1),
with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and ((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29)))))
(cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1)
Still slightly slower actual total time on the same machine as PostgreSQL though (based on a single EXPLAIN ANALYZE sample only).
The book "Relational Database Index Design and the Optimizers"proposes a vocabulary for the trade-offs in this area -- the 3 starmodel. When creating the best possible index for certain queries it issometimes inherently necessary to choose between what it calls thefirst star (which means avoiding a sort) and the second star (whichmeans having the thinnest possible "row slice"). Sometimes thosethings are in tension, which makes sense when you imagine how theindex must be physically traversed.
Aka. "Good, Fast, Cheap — Pick Any Two" ;)
Cheers,
Márton