Re: Improve Seq scan performance

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Mon, 10 Nov 2008 16:59:45 +0900

Vladimir Sitnikov wrote:
Lutischán Ferenc wrote:

 It is possible to make an index on the table, and make a seq index scan on
this values?

My understanding is that this isn't possible in PostgreSQL, because indexes
do not contain information about tuple visibility. Data read from the index
might refer to tuples that've been deleted as far as your transaction is
concerned, or to tuples that were created after your snapshot was taken.

My understanding is even though indices do not contain information on tuple
visibility, index could be used to filter out records that is known to make
no match.

Yes, that's what an index is for. As far as I know, if it's worth doing 
that, it's worth doing an index scan instead of a seq scan.

Maybe there's some hybrid type possible where you can scan the index to 
find large table regions that are known /not/ to contain tuples of 
interest and seek over them in your scan. I wouldn't know, really, but 
it sounds like it'd probably be more I/O than a pure seq scan (given the 
reading of the index too) unless the table had the values of interest 
rather neatly clustered. It'd also surely use more memory and CPU time 
processing the whole index to find table regions without values of interest.

Is that what you meant, though?

create table seq_test
 as select cast(i as text) i, repeat('*', 500) padding from
generate_series(1,10000) as s(i);

create index i_ix on seq_test(i);

vacuum analyze verbose seq_test;
-- index "i_ix" now contains 10000 row versions in *30 *pages
-- "seq_test": found 0 removable, 10000 nonremovable row versions in *667 *
pages

explain analyze select * from seq_test where i like '%123%';

A b-tree index cannot be used on a LIKE query with a leading wildcard. 
See the FAQ.

In addition, if your database is not in the C locale you can't use an 
ordinary index for LIKE queries. See the FAQ. You need to create a 
text_pattern_ops index instead:

create index i_ix_txt on seq_test(i text_pattern_ops);

set enable_seqscan=off
-- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
times better than with seq scan.
Index Scan using i_ix on seq_test  (cost=0.00..1643.74 rows=356 width=508)
(actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
  Filter: (i ~~ '%123%'::text)
Total runtime: 16.863 ms

I think it's reading the whole index, because it can't do a prefix 
search if there's a leading wildcard. I'm a bit confused, though, since 
I thought in this case it couldn't actually execute the query w/o a 
sequential scan, and would just use one irrespective of the 
enable_seqscan param. That's what happens here.

Here's what I get:

test=# create table seq_test as select cast(i as text) AS i, repeat('*', 
500) AS padding from generate_series(1,10000) as s(i);
SELECT
test=# create index i_ix on seq_test(i);
CREATE INDEX
test=# create index i_ix_txt on seq_test(i text_pattern_ops);
CREATE INDEX
test=# vacuum analyze verbose seq_test;
-- blah blah
INFO:  "seq_test": scanned 667 of 667 pages, containing 10000 live rows 
and 0 dead rows; 3000 rows in sample, 10000 estimated total rows

test=# explain analyze select * from seq_test where i like '%123%';
                                               QUERY PLAN 

---------------------------------------------------------------------------------------------------------
 Seq Scan on seq_test  (cost=0.00..792.00 rows=400 width=508) (actual 
time=0.081..5.239 rows=20 loops=1)
   Filter: (i ~~ '%123%'::text)
 Total runtime: 5.281 ms
(3 rows)

-- Now, note lack of leading wildcard:
test=# explain analyze select * from seq_test where i like '123%';
                                                    QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on seq_test  (cost=4.35..40.81 rows=10 width=508) 
(actual time=0.062..0.081 rows=11 loops=1)
   Filter: (i ~~ '123%'::text)
   ->  Bitmap Index Scan on i_ix_txt  (cost=0.00..4.35 rows=10 width=0) 
(actual time=0.048..0.048 rows=11 loops=1)
         Index Cond: ((i ~>=~ '123'::text) AND (i ~<~ '124'::text))
 Total runtime: 0.121 ms
(5 rows)

test=# set enable_seqscan=off;
SET
test=# explain analyze select * from seq_test where i like '%123%';
                                                      QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------
 Seq Scan on seq_test  (cost=100000000.00..100000792.00 rows=400 
width=508) (actual time=0.088..5.666 rows=20 loops=1)
   Filter: (i ~~ '%123%'::text)
 Total runtime: 5.702 ms
(3 rows)

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance