Re: Bitmap scan is undercosted?

pryzby@xxxxxxxxxxxxx (Justin Pryzby) · Fri, 1 Dec 2017 20:06:03 -0600

On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote:
> I tried to reproduce this issue and couldn't, under PG95 and 10.1:

I'm embarassed to say that I mis-read your message, despite you're amply clear
subject.  You're getting a bitmap scan but you'd prefer to get an index scan.
I anticipated the opposite problem (which is what I've had issues with myself).

> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
> > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > > We recently had an issue in production, where a bitmap scan was chosen
> > > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > > about the same cost as the index scan.
> 
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
> correlation | 0.00710112
> 
> ..so this is different from the issue corrected by the patch I created while
> testing.

Actually, that the table is "not correlated" on "num" column is maybe the
primary reason why PG avoids using an index scan.  It (more or less correctly)
deduces that it's going to have to "read" a large fraction of the pages (even
if only to process a small fraction of the rows), which is costly, except it's
all cached..  In your case, that overly-penalizes the index scan.

This is cost_index() and cost_bitmap_heap_scan() in costsize.c.  Since the
index is uncorrelated, it's returning something close to max_IO_cost.  It looks
like effective_cache_size only affects index_pages_fetched().

I'm going to try to dig some more into it.  Maybe there's evidence to
re-evaluate one of these:

cost_index()
| run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
or
cost_bitmap_heap_scan()
| cost_per_page = spc_random_page_cost - 
|               (spc_random_page_cost - spc_seq_page_cost)
|               * sqrt(pages_fetched / T);

Justin