Re: Forcing using index instead of sequential scan?

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 22 Jul 2006 12:22:21 -0400

<robin.c.smith@xxxxxx> writes:
> I have been testing the performance of PostgreSQL using the simple tool
> found at http://benchw.sourceforge.net however I have found that all the
> queries it run execute with sequential scans. The website where the code
> runs has examples of the execution plan using indexes.

The reason the website gets indexscans is that he's fooled with the
planner cost parameters.  In particular I see that benchw's
documentation suggests
	effective_cache_size	= 48000
	random_page_cost	= 0.8
The latter is physically silly but it's a pretty effective thumb on the
scales if you want to force indexscan usage.

The real issue here is caching across successive queries, an effect that
Postgres doesn't deal with very well at the moment.  If you run these
queries from a standing start (freshly booted machine) you'll likely
find that the indexscan plan is indeed slower than the seqscan/hash
plan, just like the planner thinks.  I get about 52 sec for query0
with an indexscan vs about 35 sec for the seqscan.  However, successive
executions of the seqscan plan stay at about 35 sec, whereas the
indexscan plan drops to 2 sec(!).  This is because the fraction of the
table touched by the indexscan plan is small enough to fit in my
machine's RAM --- I can see by das blinkenlights (and also vmstat) that
there's no I/O going on at all during re-executions of the indexscan.
If I run the seqscan and then the indexscan, the indexscan takes about
28 sec, so there's still some useful cached data even though the seqscan
read more stuff than fits in RAM.  (Note: this is with Fedora Core 5,
YMMV depending on your kernel's cache algorithms.)

In a real-world situation it's unlikely you'd just re-execute the same
query over and over, so this benchmark is really too simplistic to trust
very far as an indicator of what to do in practice.

I find that CVS tip will choose the indexscan for query0 if I set
effective_cache_size to 62500 (ie, half a gigabyte, or half of this
machine's RAM) and set random_page_cost to 1.5 or less.

If you want the planner to work on the assumption that everything's
cached, set effective_cache_size to a large value and set
random_page_cost to 1.0 --- you might also want to increase the CPU
cost settings, reflecting the fact that I/O is cheaper relative to
CPU effort than the default settings assume.  However, if your database
is too large to fit in RAM then these are likely to be pretty bad
settings.  Many people compromise with a random_page_cost around 2
or so.

			regards, tom lane