Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh <jesper@xxxxxxxx> wrote:
>> Given that the seq-scan have to visit 50K row to create the result and
>> the bitmap heap scan only have to visit 40K (but search the index) we
>> would expect the seq-scan to be at most 25% more expensive than the
>> bitmap-heap scan.. e.g. less than 300ms.
>
> I've seen behavior similar to this in the past with a plain old B-tree
> index.  As in your case, a bitmap index scan was significantly faster
> than a sequential scan even though essentially all the heap pages had
> to be scanned, but the planner expected the opposite to be true.  The
> planner's expectation is that the dominent cost will be fetching the
> pages, and it furthermore thinks that fetching things in sequential
> order is much better than skipping around randomly.  However, if all
> the pages are memory-resident - possibly even in L2 or L3 CPU cache -
> fetching the pages is nearly free, so the dominant cost becomes the
> CPU time to process the tuples.

Well, no. This topic is not at all about the query-planner. It is about
the actual run-time of the two "allmost" identical queries. It may be
that we're seeing the results because one fits better into L2 or L3 cache,
but the complete dataset is memory resident and run multiple times in
a row to eliminate disk-access.

> My best guess is that in cases like this index cond is cheaper to
> evaluate than the recheck cond/filter, so the index scan wins not by
> reading fewer pages but by avoiding the need to examine some of the
> tuples on each page.  I might be all wet, though.

In my example the seq-scan evaulates 50K tuples and the heap-scan 40K.
The question is why does the "per-tuple" evaluation become that much more
expensive (x7.5)[1] on the seq-scan than on the index-scan, when the
complete dataset indeed is in memory?

> If your whole database fits in RAM, you could try changing your
> seq_page_cost and random_page_cost variables from the default values
> of 1 and 4 to something around 0.05, or maybe even 0.01, and see
> whether that helps.

This is about planning the query. We're talking actual runtimes here.

[1] 50K tuples in 1.800ms vs. 40K tuples in 200ms

-- 
Jesper


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux