Lots of read activity on index only scan

"Peter J. Holzer" <hjp-pgsql@xxxxxx> · Fri, 18 Nov 2022 21:46:17 +0100

This is triggered by Hans-Jürgen Schönig's blog post on B-Tree vs. BRIN
indexes here: https://www.cybertec-postgresql.com/en/btree-vs-brin-2-options-for-indexing-in-postgresql-data-warehouses/

He creates a rather large table with a sorted and an unsorted column
plus two indexes:

CREATE TABLE t_demo (id_sorted int8, id_random int8);
INSERT INTO t_demo
       SELECT id, hashtext(id::text)
       FROM generate_series(1, 2000000000) AS id;
create index on t_demo (id_sorted);
create index on t_demo (id_random);

(Actually, his table is even larger, but I didn't have enough free space
on my test server and didn't want to extend the file system just for
this test. Anyway. That's large enough that the setup takes longer than
a cup of coffee, so maybe you want to go even lower.)

The table is now about 82 GB, and the indexes 42 GB and 40 GB,
respectively.

Then select about 10E6 values from each index:

wds=> select count(*) from t_demo where id_sorted between 10000000 and 20000000;
╔══════════╗
║  count   ║
╟──────────╢
║ 10000001 ║
╚══════════╝
(1 row)

Time: 349.186 ms

wds=> select count(*) from t_demo where id_random between 10000000 and 31475500;
╔══════════╗
║  count   ║
╟──────────╢
║ 10000028 ║
╚══════════╝
(1 row)

Time: 1296.489 ms (00:01.296)

The second is a bit slower but (on this server with lots of RAM) not
that much. However, if you look at the EXPLAIN output, the difference is
striking (wide output, sorry):

wds=> explain (analyze, buffers) select count(*) from t_demo where id_sorted between 10000000 and 20000000;
╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                   QUERY PLAN                                                                                   ║
╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Finalize Aggregate  (cost=203279.97..203279.98 rows=1 width=8) (actual time=545.388..559.523 rows=1 loops=1)                                                                   ║
║   Buffers: shared hit=27336                                                                                                                                                    ║
║   ->  Gather  (cost=203279.76..203279.97 rows=2 width=8) (actual time=545.201..559.504 rows=3 loops=1)                                                                         ║
║         Workers Planned: 2                                                                                                                                                     ║
║         Workers Launched: 2                                                                                                                                                    ║
║         Buffers: shared hit=27336                                                                                                                                              ║
║         ->  Partial Aggregate  (cost=202279.76..202279.77 rows=1 width=8) (actual time=508.272..508.274 rows=1 loops=3)                                                        ║
║               Buffers: shared hit=27336                                                                                                                                        ║
║               ->  Parallel Index Only Scan using t_demo_id_sorted_idx on t_demo  (cost=0.58..191374.26 rows=4362197 width=0) (actual time=0.075..352.469 rows=3333334 loops=3) ║
║                     Index Cond: ((id_sorted >= 10000000) AND (id_sorted <= 20000000))                                                                                          ║
║                     Heap Fetches: 0                                                                                                                                            ║
║                     Buffers: shared hit=27336                                                                                                                                  ║
║ Planning:                                                                                                                                                                      ║
║   Buffers: shared hit=5                                                                                                                                                        ║
║ Planning Time: 0.255 ms                                                                                                                                                        ║
║ JIT:                                                                                                                                                                           ║
║   Functions: 11                                                                                                                                                                ║
║   Options: Inlining false, Optimization false, Expressions true, Deforming true                                                                                                ║
║   Timing: Generation 1.217 ms, Inlining 0.000 ms, Optimization 0.762 ms, Emission 12.777 ms, Total 14.756 ms                                                                   ║
║ Execution Time: 560.422 ms                                                                                                                                                     ║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

wds=> explain (analyze, buffers) select count(*) from t_demo where id_random between 10000000 and 31475500;
╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                   QUERY PLAN                                                                                    ║
╟─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Finalize Aggregate  (cost=173323.64..173323.65 rows=1 width=8) (actual time=1513.163..1520.166 rows=1 loops=1)                                                                  ║
║   Buffers: shared hit=9995216                                                                                                                                                   ║
║   ->  Gather  (cost=173323.42..173323.63 rows=2 width=8) (actual time=1513.017..1520.154 rows=3 loops=1)                                                                        ║
║         Workers Planned: 2                                                                                                                                                      ║
║         Workers Launched: 2                                                                                                                                                     ║
║         Buffers: shared hit=9995216                                                                                                                                             ║
║         ->  Partial Aggregate  (cost=172323.42..172323.43 rows=1 width=8) (actual time=1492.693..1492.694 rows=1 loops=3)                                                       ║
║               Buffers: shared hit=9995216                                                                                                                                       ║
║               ->  Parallel Index Only Scan using t_demo_id_random_idx on t_demo  (cost=0.58..162927.68 rows=3758297 width=0) (actual time=0.058..1338.110 rows=3333343 loops=3) ║
║                     Index Cond: ((id_random >= 10000000) AND (id_random <= 31475500))                                                                                           ║
║                     Heap Fetches: 0                                                                                                                                             ║
║                     Buffers: shared hit=9995216                                                                                                                                 ║
║ Planning Time: 0.186 ms                                                                                                                                                         ║
║ JIT:                                                                                                                                                                            ║
║   Functions: 11                                                                                                                                                                 ║
║   Options: Inlining false, Optimization false, Expressions true, Deforming true                                                                                                 ║
║   Timing: Generation 1.204 ms, Inlining 0.000 ms, Optimization 0.634 ms, Emission 11.510 ms, Total 13.348 ms                                                                    ║
║ Execution Time: 1521.068 ms                                                                                                                                                     ║
╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Both do a parallel index only scan. Both perform 0 heap fetches.
But one reads 27336 buffers (or about 22 bytes per index entry, which
sounds reasonable) while the other reads 9995216 buffers (or almost one
full buffer per row). Why? The entries should be dense in the index in
both cases and since it's an index only scan (and explain says there
were 0 heap fetches) I would not expect extra accesses. Where do these
buffer reads come from?

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@xxxxxx         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
Attachment:
signature.asc

Description: PGP signature