Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

Jesper Krogh <jesper@xxxxxxxx> · Tue, 27 Oct 2009 07:42:00 +0100

Craig Ringer wrote:
> On 8.4 on a different system Pg uses the seq scan by preference, with a
> runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when
> searching for `commonterm' even when enable_seqscan is set to `off'. A
> search for `commonterm80' also uses a seq scan (1067ms),  but if
> enable_seqscan is set to off it'll use a bitmap heap scan at 237ms.

Ok, thats excactly as my number.

> On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ...
> odd. If I force it not to use a bitmap heap scan it'll use an index
> scan.  Preventing that too results in a seq scan with a runtime of
> 1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like
> a pretty strange result on face value.

PG 8.3 doesnt have statistics data available for gin-indexes so that may
be why the query-planner can do otherwise on 8.3. It also means that it
is a regression since in these cases 8.4 will perform worse than 8.3
did. (allthough the statistics makes a lot other cases way better).

> So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT
> longer than the bitmap heap scan, though similar numbers of tuples are
> being read by both. 
>
> I see the same results when actually reading the results rather than
> just doing an `explain analyze'. With psql set to send output
> to /dev/null and with \timing enabled:
> 
> test=> \o /dev/null
> test=> set enable_seqscan = on;
> Time: 0.282 ms
> test=> select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
> Time: 988.880 ms
> test=> set enable_seqscan = off;
> Time: 0.286 ms
> test=> select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
> Time: 159.167 ms
> 
> so - nearly 1s vs 0.15s is a big difference between what I previously
> confirmed to be bitmap heap scan and seq scan respectively for the same
> query. The same number of records are being returned in both cases.
> 
> If I "select *" rather than just reading the `id' field, the runtimes
> are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both
> reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE'
> results are still quite different, though, at 1020ms seq scan vs 233ms
> bitmap heap, suggesting that the similarity is created only by the time
> taken to actually transfer the data to the client. The time difference
> between the two is much the same.
> 
> So - for some reason the seq scan takes 800ms or so longer than the
> bitmap heap scan. I can see why you're puzzled. I can reproduce it on
> two different machines with two different Pg versions, and using two
> slightly different methods for loading the data as well.  So, I can
> confirm your test results now that I'm actually testing properly.

Thanks a lot.

> test=> explain analyze select * from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                               QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on ftstest  (cost=25836.66..36432.95 rows=39753
> width=54) (actual time=27.452..175.481 rows=39852 loops=1)
>    Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>    ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..25826.72
> rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1)
>          Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 233.473 ms
> (5 rows)
> 
> test=> set enable_seqscan = on;
> SET
> test=> explain analyze select * from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Seq Scan on ftstest  (cost=0.00..10750.00 rows=39753 width=54) (actual
> time=0.141..956.496 rows=39852 loops=1)
>    Filter: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 1020.936 ms
> (3 rows)

My systems seems more to prefer bitmap-scans a bit more, but given the
actual number it seems to be preferrablem. Thats about query-planning,
my main reason for posting was the actual run time.

> By the way, for the 8.4 test I modifed the loader script so it wouldn't
> take quite so painfully long to run second time 'round. I turned
> autocommit off, wrapped all the inserts up in a single transaction, and
> moved the fts index creation to after all the data has been inserted.
> It's a *LOT* faster, and the test results match yours.

I'll make that change if I have to work a bit more with it.

Thanks for speding time confirming my findings. (the I know its not just
 me getting blind at some problem).

Jesper
-- 
Jesper

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance