This is triggered by Hans-Jürgen Schönig's blog post on B-Tree vs. BRIN indexes here: https://www.cybertec-postgresql.com/en/btree-vs-brin-2-options-for-indexing-in-postgresql-data-warehouses/ He creates a rather large table with a sorted and an unsorted column plus two indexes: CREATE TABLE t_demo (id_sorted int8, id_random int8); INSERT INTO t_demo SELECT id, hashtext(id::text) FROM generate_series(1, 2000000000) AS id; create index on t_demo (id_sorted); create index on t_demo (id_random); (Actually, his table is even larger, but I didn't have enough free space on my test server and didn't want to extend the file system just for this test. Anyway. That's large enough that the setup takes longer than a cup of coffee, so maybe you want to go even lower.) The table is now about 82 GB, and the indexes 42 GB and 40 GB, respectively. Then select about 10E6 values from each index: wds=> select count(*) from t_demo where id_sorted between 10000000 and 20000000; ╔══════════╗ ║ count ║ ╟──────────╢ ║ 10000001 ║ ╚══════════╝ (1 row) Time: 349.186 ms wds=> select count(*) from t_demo where id_random between 10000000 and 31475500; ╔══════════╗ ║ count ║ ╟──────────╢ ║ 10000028 ║ ╚══════════╝ (1 row) Time: 1296.489 ms (00:01.296) The second is a bit slower but (on this server with lots of RAM) not that much. However, if you look at the EXPLAIN output, the difference is striking (wide output, sorry): wds=> explain (analyze, buffers) select count(*) from t_demo where id_sorted between 10000000 and 20000000; ╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ QUERY PLAN ║ ╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ Finalize Aggregate (cost=203279.97..203279.98 rows=1 width=8) (actual time=545.388..559.523 rows=1 loops=1) ║ ║ Buffers: shared hit=27336 ║ ║ -> Gather (cost=203279.76..203279.97 rows=2 width=8) (actual time=545.201..559.504 rows=3 loops=1) ║ ║ Workers Planned: 2 ║ ║ Workers Launched: 2 ║ ║ Buffers: shared hit=27336 ║ ║ -> Partial Aggregate (cost=202279.76..202279.77 rows=1 width=8) (actual time=508.272..508.274 rows=1 loops=3) ║ ║ Buffers: shared hit=27336 ║ ║ -> Parallel Index Only Scan using t_demo_id_sorted_idx on t_demo (cost=0.58..191374.26 rows=4362197 width=0) (actual time=0.075..352.469 rows=3333334 loops=3) ║ ║ Index Cond: ((id_sorted >= 10000000) AND (id_sorted <= 20000000)) ║ ║ Heap Fetches: 0 ║ ║ Buffers: shared hit=27336 ║ ║ Planning: ║ ║ Buffers: shared hit=5 ║ ║ Planning Time: 0.255 ms ║ ║ JIT: ║ ║ Functions: 11 ║ ║ Options: Inlining false, Optimization false, Expressions true, Deforming true ║ ║ Timing: Generation 1.217 ms, Inlining 0.000 ms, Optimization 0.762 ms, Emission 12.777 ms, Total 14.756 ms ║ ║ Execution Time: 560.422 ms ║ ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ wds=> explain (analyze, buffers) select count(*) from t_demo where id_random between 10000000 and 31475500; ╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ QUERY PLAN ║ ╟─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ Finalize Aggregate (cost=173323.64..173323.65 rows=1 width=8) (actual time=1513.163..1520.166 rows=1 loops=1) ║ ║ Buffers: shared hit=9995216 ║ ║ -> Gather (cost=173323.42..173323.63 rows=2 width=8) (actual time=1513.017..1520.154 rows=3 loops=1) ║ ║ Workers Planned: 2 ║ ║ Workers Launched: 2 ║ ║ Buffers: shared hit=9995216 ║ ║ -> Partial Aggregate (cost=172323.42..172323.43 rows=1 width=8) (actual time=1492.693..1492.694 rows=1 loops=3) ║ ║ Buffers: shared hit=9995216 ║ ║ -> Parallel Index Only Scan using t_demo_id_random_idx on t_demo (cost=0.58..162927.68 rows=3758297 width=0) (actual time=0.058..1338.110 rows=3333343 loops=3) ║ ║ Index Cond: ((id_random >= 10000000) AND (id_random <= 31475500)) ║ ║ Heap Fetches: 0 ║ ║ Buffers: shared hit=9995216 ║ ║ Planning Time: 0.186 ms ║ ║ JIT: ║ ║ Functions: 11 ║ ║ Options: Inlining false, Optimization false, Expressions true, Deforming true ║ ║ Timing: Generation 1.204 ms, Inlining 0.000 ms, Optimization 0.634 ms, Emission 11.510 ms, Total 13.348 ms ║ ║ Execution Time: 1521.068 ms ║ ╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ Both do a parallel index only scan. Both perform 0 heap fetches. But one reads 27336 buffers (or about 22 bytes per index entry, which sounds reasonable) while the other reads 9995216 buffers (or almost one full buffer per row). Why? The entries should be dense in the index in both cases and since it's an index only scan (and explain says there were 0 heap fetches) I would not expect extra accesses. Where do these buffer reads come from? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature