bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

I'm currently trying to figure out why the tsearch performance seems to
vary a lot between different queryplans. I have created a sample dataset
that sort of resembles the data I have to work on.

The script that builds the dataset is at:
http://krogh.cc/~jesper/build-test.pl
and http://krogh.cc/~jesper/words.txt is needed for it to run.

Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.

The dataset consists of words randomized, but .. all records contains
"commonterm", around 80% contains commonterm80 and so on..

	my $rand = rand();
	push @doc,"commonterm" if $commonpos == $j;
	push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;

Results are run multiple times after each other so they should be
reproducible:

ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..10750.00 rows=40188 width=4) (actual
time=0.102..1792.215 rows=40082 loops=1)
   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 1809.437 ms
(3 rows)

ftstest=# set enable_seqscan=off;
SET
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=115389.14..125991.96 rows=40188
width=4) (actual time=17.445..197.356 rows=40082 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..115379.09
rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 204.201 ms
(5 rows)

Given that the seq-scan have to visit 50K row to create the result and
the bitmap heap scan only have to visit 40K (but search the index) we
would expect the seq-scan to be at most 25% more expensive than the
bitmap-heap scan.. e.g. less than 300ms.

Jesper
-- 
Jesper

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux