Hi everyone, first time here. Please kindly let me know if this is not the right place to ask.
I notice a simple query can read a lot of buffer blocks in a meaningless way, when
1. there is an index scan on a multicolumn index
2. there is row constructor comparison in the Index Cond
3. there is also an equality constraint on the leftmost column of the multicolumn index
## How to reproduce
I initially noticed it on AWS Aurora RDS, but it can be reproduced in docker container as well.
```bash
docker run --name test-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres:16.3
```
Create a table with a multicolumn index. Populate 12 million rows with random integers.
```sql
CREATE TABLE t(a int, b int);
CREATE INDEX my_idx ON t USING BTREE (a, b);
INSERT INTO t(a, b)
SELECT
(random() * 123456)::int AS a,
(random() * 123456)::int AS b
FROM
generate_series(1, 12345678);
ANALYZE t;
```
Simple query that uses the multicolumn index.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..8.46 rows=1 width=8) (actual time=284.312..284.314 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
Heap Fetches: 0
Buffers: shared hit=3777 read=37304 written=11713
Planning:
Buffers: shared hit=22 read=4
Planning Time: 0.270 ms
Execution Time: 284.341 ms
(8 rows)
```
## Expected output
The number of buffer blocks used is high. I expect it to be no more than when there’s only one constraint.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) order by a, b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..23.67 rows=642 width=8) (actual time=0.030..0.158 rows=542 loops=1)
Index Cond: (ROW(a, b) > ROW(123450, 123450))
Heap Fetches: 0
Buffers: shared hit=254 read=3
Planning:
Buffers: shared read=4
Planning Time: 0.232 ms
Execution Time: 0.206 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where a = 0 order by a, b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..6.20 rows=101 width=8) (actual time=0.099..0.113 rows=57 loops=1)
Index Cond: (a = 0)
Heap Fetches: 0
Buffers: shared hit=27 read=2
Planning Time: 0.081 ms
Execution Time: 0.131 ms
(6 rows)
```
## Postgres version
16.3
## Platform information
I can reproduce it on the latest postgres docker image, which is based on Debian Linux. Originally found the issue on AWS Aurora.
The following are my own observation and thoughts. Please disregard if it’s distraction.
For a general form of
```sql
select * from t where (a, b) > (x, y) and a = z order by a, b;
```
1. The number of buffer blocks is proportional to the gap between x and z. Strictly, it’s max(0, min(x, max(a)) – max(z, min(a))).
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = -30000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=243.173..243.175 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = '-30000'::integer))
Heap Fetches: 0
Buffers: shared hit=1 read=41080
Planning:
Buffers: shared hit=2 read=2
Planning Time: 0.174 ms
Execution Time: 243.199 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=230.425..230.426 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
Heap Fetches: 0
Buffers: shared hit=1 read=41080
Planning:
Buffers: shared read=4
Planning Time: 0.296 ms
Execution Time: 230.460 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 30000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=171.787..171.788 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 30000))
Heap Fetches: 0
Buffers: shared hit=1 read=31126
Planning:
Buffers: shared read=4
Planning Time: 0.191 ms
Execution Time: 171.812 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 60000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=137.516..137.518 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 60000))
Heap Fetches: 0
Buffers: shared hit=1 read=21139
Planning:
Buffers: shared read=4
Planning Time: 0.212 ms
Execution Time: 137.543 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 90000 order by a, b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=57.868..57.870 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 90000))
Heap Fetches: 0
Buffers: shared hit=11187 read=1
Planning:
Buffers: shared hit=1 read=3
Planning Time: 0.240 ms
Execution Time: 57.896 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 120000 order by a, b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=6.018..6.019 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 120000))
Heap Fetches: 0
Buffers: shared hit=1173 read=1
Planning:
Buffers: shared hit=4
Planning Time: 0.122 ms
Execution Time: 6.052 ms
(8 rows)
```
2. It’s not an issue when `a=x` becomes `a<x` or `a>x`.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a < 100 order by a, b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a < 100))
Heap Fetches: 0
Buffers: shared hit=3
Planning:
Buffers: shared hit=8
Planning Time: 0.119 ms
Execution Time: 0.020 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a > 100 order by a, b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..25.25 rows=641 width=8) (actual time=0.040..0.339 rows=542 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a > 100))
Heap Fetches: 0
Buffers: shared hit=257
Planning:
Buffers: shared hit=8
Planning Time: 0.233 ms
Execution Time: 0.443 ms
(8 rows)
```
3. It’s not an issue when `a=x` becomes `b=x`.
4. The example query is trivial and for demo purpose only. Obviously there’s no need to supply `a = 0` when there’s `(a, b) > (123450, 123450)`. However, in practice it can be a problem when the table is joined to other tables, resulting in a nested loop for a list of `a` values that we have no control of, while `(a, b) > (x, y)` is used for paging.
5. My current workaround is add `AND a >= x` to `(a, b) > (x, y)`. However, this makes the planner underestimate the number of rows due to the multiplied selectivities.
Best regards,
Yan
I notice a simple query can read a lot of buffer blocks in a meaningless way, when
1. there is an index scan on a multicolumn index
2. there is row constructor comparison in the Index Cond
3. there is also an equality constraint on the leftmost column of the multicolumn index
## How to reproduce
I initially noticed it on AWS Aurora RDS, but it can be reproduced in docker container as well.
```bash
docker run --name test-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres:16.3
```
Create a table with a multicolumn index. Populate 12 million rows with random integers.
```sql
CREATE TABLE t(a int, b int);
CREATE INDEX my_idx ON t USING BTREE (a, b);
INSERT INTO t(a, b)
SELECT
(random() * 123456)::int AS a,
(random() * 123456)::int AS b
FROM
generate_series(1, 12345678);
ANALYZE t;
```
Simple query that uses the multicolumn index.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..8.46 rows=1 width=8) (actual time=284.312..284.314 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
Heap Fetches: 0
Buffers: shared hit=3777 read=37304 written=11713
Planning:
Buffers: shared hit=22 read=4
Planning Time: 0.270 ms
Execution Time: 284.341 ms
(8 rows)
```
## Expected output
The number of buffer blocks used is high. I expect it to be no more than when there’s only one constraint.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) order by a, b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..23.67 rows=642 width=8) (actual time=0.030..0.158 rows=542 loops=1)
Index Cond: (ROW(a, b) > ROW(123450, 123450))
Heap Fetches: 0
Buffers: shared hit=254 read=3
Planning:
Buffers: shared read=4
Planning Time: 0.232 ms
Execution Time: 0.206 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where a = 0 order by a, b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..6.20 rows=101 width=8) (actual time=0.099..0.113 rows=57 loops=1)
Index Cond: (a = 0)
Heap Fetches: 0
Buffers: shared hit=27 read=2
Planning Time: 0.081 ms
Execution Time: 0.131 ms
(6 rows)
```
## Postgres version
16.3
## Platform information
I can reproduce it on the latest postgres docker image, which is based on Debian Linux. Originally found the issue on AWS Aurora.
The following are my own observation and thoughts. Please disregard if it’s distraction.
For a general form of
```sql
select * from t where (a, b) > (x, y) and a = z order by a, b;
```
1. The number of buffer blocks is proportional to the gap between x and z. Strictly, it’s max(0, min(x, max(a)) – max(z, min(a))).
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = -30000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=243.173..243.175 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = '-30000'::integer))
Heap Fetches: 0
Buffers: shared hit=1 read=41080
Planning:
Buffers: shared hit=2 read=2
Planning Time: 0.174 ms
Execution Time: 243.199 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 0 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=230.425..230.426 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 0))
Heap Fetches: 0
Buffers: shared hit=1 read=41080
Planning:
Buffers: shared read=4
Planning Time: 0.296 ms
Execution Time: 230.460 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 30000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=171.787..171.788 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 30000))
Heap Fetches: 0
Buffers: shared hit=1 read=31126
Planning:
Buffers: shared read=4
Planning Time: 0.191 ms
Execution Time: 171.812 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 60000 order by a, b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=137.516..137.518 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 60000))
Heap Fetches: 0
Buffers: shared hit=1 read=21139
Planning:
Buffers: shared read=4
Planning Time: 0.212 ms
Execution Time: 137.543 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 90000 order by a, b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=57.868..57.870 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 90000))
Heap Fetches: 0
Buffers: shared hit=11187 read=1
Planning:
Buffers: shared hit=1 read=3
Planning Time: 0.240 ms
Execution Time: 57.896 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a = 120000 order by a, b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=6.018..6.019 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a = 120000))
Heap Fetches: 0
Buffers: shared hit=1173 read=1
Planning:
Buffers: shared hit=4
Planning Time: 0.122 ms
Execution Time: 6.052 ms
(8 rows)
```
2. It’s not an issue when `a=x` becomes `a<x` or `a>x`.
```
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a < 100 order by a, b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..4.45 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a < 100))
Heap Fetches: 0
Buffers: shared hit=3
Planning:
Buffers: shared hit=8
Planning Time: 0.119 ms
Execution Time: 0.020 ms
(8 rows)
postgres=# explain (analyze, buffers) select * from t where row(a, b) > row(123450, 123450) and a > 100 order by a, b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Index Only Scan using my_idx on t (cost=0.43..25.25 rows=641 width=8) (actual time=0.040..0.339 rows=542 loops=1)
Index Cond: ((ROW(a, b) > ROW(123450, 123450)) AND (a > 100))
Heap Fetches: 0
Buffers: shared hit=257
Planning:
Buffers: shared hit=8
Planning Time: 0.233 ms
Execution Time: 0.443 ms
(8 rows)
```
3. It’s not an issue when `a=x` becomes `b=x`.
4. The example query is trivial and for demo purpose only. Obviously there’s no need to supply `a = 0` when there’s `(a, b) > (123450, 123450)`. However, in practice it can be a problem when the table is joined to other tables, resulting in a nested loop for a list of `a` values that we have no control of, while `(a, b) > (x, y)` is used for paging.
5. My current workaround is add `AND a >= x` to `(a, b) > (x, y)`. However, this makes the planner underestimate the number of rows due to the multiplied selectivities.
Best regards,
Yan