Re: Improve Seq scan performance

"Vladimir Sitnikov" <sitnikov.vladimir@xxxxxxxxx> · Sun, 9 Nov 2008 23:37:00 -0800

Lutischán Ferenc wrote:

It is possible to make an index on the table, and make a seq index scan on this values?

My understanding is that this isn't possible in PostgreSQL, because indexes do not contain information about tuple visibility. Data read from the index might refer to tuples that've been deleted as far as your transaction is concerned, or to tuples that were created after your snapshot was taken.

My understanding is even though indices do not contain information on tuple visibility, index could be used to filter out records that is known to make no match. Since btree index stores exact values, PostgreSQL could scan through the index and skip those entries that do not contain '%aaa%'. That will dramatically improve cases where the criteria has good selectivity, since index has much more compact structure than table.

As far as I understand, it is discouraged to implement/suggest patches during Commitfest, however, I would love to treat the following case as a "performance bug" and add it to the "TODO" list:

create table seq_test
 as select cast(i as text) i, repeat('*', 500) padding from generate_series(1,10000) as s(i);

create index i_ix on seq_test(i);

vacuum analyze verbose seq_test;
-- index "i_ix" now contains 10000 row versions in 30 pages

-- "seq_test": found 0 removable, 10000 nonremovable row versions in 667 pages

explain analyze select * from seq_test where i like '%123%';

-- Seq Scan reads 667 pages (as expected)
Seq Scan on seq_test  (cost=0.00..792.00 rows=356 width=508) (actual time=0.129..9.071 rows=20 loops=1 read_shared=667(667) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)

  Filter: (i ~~ '%123%'::text)
Total runtime: 9.188 ms

set enable_seqscan=off
-- Index Scan reads 2529 pages for some reason. I would expect 30 (index size) + 20 (number of matching entries) = 50 pages maximum, that is 10 times better than with seq scan.

Index Scan using i_ix on seq_test  (cost=0.00..1643.74 rows=356 width=508) (actual time=0.334..16.746 rows=20 loops=1 read_shared=2529(2529) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)

  Filter: (i ~~ '%123%'::text)
Total runtime: 16.863 ms

Hopefully, there will be a clear distinction between filtering via index and filtering via table access.

Regards,
Vladimir Sitnikov