Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 27 Oct 2009 14:14:37 +0800

On Tue, 2009-10-27 at 06:44 +0100, Jesper Krogh wrote:

> Here you should search for "commonterm" not "commonterm80", commonterm
> will go into a seq-scan. You're not testing the same thing as I did.

Point taken. I ran the same commands as you, but as the planner picked
different plans it wasn't much use. The fact that I didn't notice that
is a bit worrying, as it suggests and even worse than normal degree of
brain-fade. Sorry for the waste of time.

Anyway, testing more usefully:

On 8.4 on a different system Pg uses the seq scan by preference, with a
runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when
searching for `commonterm' even when enable_seqscan is set to `off'. A
search for `commonterm80' also uses a seq scan (1067ms),  but if
enable_seqscan is set to off it'll use a bitmap heap scan at 237ms.

On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ...
odd. If I force it not to use a bitmap heap scan it'll use an index
scan.  Preventing that too results in a seq scan with a runtime of
1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like
a pretty strange result on face value.

So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT
longer than the bitmap heap scan, though similar numbers of tuples are
being read by both. 

I see the same results when actually reading the results rather than
just doing an `explain analyze'. With psql set to send output
to /dev/null and with \timing enabled:

test=> \o /dev/null
test=> set enable_seqscan = on;
Time: 0.282 ms
test=> select id from ftstest where body_fts @@
to_tsquery('commonterm80');
Time: 988.880 ms
test=> set enable_seqscan = off;
Time: 0.286 ms
test=> select id from ftstest where body_fts @@
to_tsquery('commonterm80');
Time: 159.167 ms

so - nearly 1s vs 0.15s is a big difference between what I previously
confirmed to be bitmap heap scan and seq scan respectively for the same
query. The same number of records are being returned in both cases.

If I "select *" rather than just reading the `id' field, the runtimes
are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both
reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE'
results are still quite different, though, at 1020ms seq scan vs 233ms
bitmap heap, suggesting that the similarity is created only by the time
taken to actually transfer the data to the client. The time difference
between the two is much the same.

So - for some reason the seq scan takes 800ms or so longer than the
bitmap heap scan. I can see why you're puzzled. I can reproduce it on
two different machines with two different Pg versions, and using two
slightly different methods for loading the data as well.  So, I can
confirm your test results now that I'm actually testing properly.

test=> explain analyze select * from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=25836.66..36432.95 rows=39753
width=54) (actual time=27.452..175.481 rows=39852 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..25826.72
rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 233.473 ms
(5 rows)

test=> set enable_seqscan = on;
SET
test=> explain analyze select * from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..10750.00 rows=39753 width=54) (actual
time=0.141..956.496 rows=39852 loops=1)
   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 1020.936 ms
(3 rows)

By the way, for the 8.4 test I modifed the loader script so it wouldn't
take quite so painfully long to run second time 'round. I turned
autocommit off, wrapped all the inserts up in a single transaction, and
moved the fts index creation to after all the data has been inserted.
It's a *LOT* faster, and the test results match yours.

> they were run repeatedly.

Yeah, just saw that in your original mail. Sorry.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance