Re: Slow query: bitmap scan troubles

Jeff Janes <jeff.janes@xxxxxxxxx> · Tue, 4 Dec 2012 15:42:21 -0800

On Tue, Dec 4, 2012 at 10:03 AM,  <postgresql@xxxxxxxxx> wrote:
>
> Though that doesn't account for the 70x difference between the speed of the
> two queries in actuality given a pretty similar expected speed (does it?).

It kind of does.  The expected speed is predicated on the number of
rows being 200 fold higher.  If the number of rows actually was that
much higher, the two speeds might be closer together.  That is why it
would be interesting to see a more typical case where the actual
number of rows is closer to the 2000 estimate.

But I am curious about how the cost estimate for the primary key look
up is arrived at:

Index Scan using cons_pe_primary_key on position_effect
(cost=0.00..42.96 rows=1 width=16)

There should be a random page for the index leaf page, and a random
page for the heap page.  Since you set random_page_cost to 2, that
comes up to 4.  Then there would be some almost negligible CPU costs.
Where the heck is the extra 38 cost coming from?

> It does go some way to explaining why a bad choice of plan was made.
>
> Is there some nice bit of literature somewhere that explains what sort of
> costs are associated with the different types of lookup?

I've heard good things about Greg Smith's book, but I don't know if it
covers this particular thing.

Otherwise, I don't know of a good single place which is a tutorial
rather than a reference (or the code itself)

>>> First, make sure caching isn't interfering with your results. Run each
>>> query several times.
>> If that is not how the production system works (running the same query
> over and over) then you want to model the cold cache, not the hot one.
>> But in any case, the posted explains indicates that all buffers were
> cached.
>
> We are in the rather pleasant situation here in that we are willing to spend
> money on the box (up to a point, but quite a large point) to get it up to
> the spec so that it should hardly ever need to touch the disk, the trick is
> figuring out how to let our favourite database server know that.

Well, that part is fairly easy.  Make random_page_cost and
seq_page_cost much smaller than their defaults.  Like, 0.04 and 0.03,
for example.

I think the *_page_cost should strictly an estimate of actually doing
IO, with a separate parameter to reflect likelihood of needing to do
the IO, like *_page_cachedness.  But that isn't the way it is done
currently.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance