Re: Seqscan/Indexscan still a known issue?

Tomas Vondra <tv@xxxxxxxx> · Sat, 27 Jan 2007 10:54:22 +0100

> 
> Hi,
> 
> I find various references in the list to this issue of queries
> being too slow because the planner miscalculates things and
> decides to go for a sequenctial scan when an index is available
> and would lead to better performance.
> 
> Is this still an issue with the latest version?   I'm doing some
> tests right now, but I have version 7.4  (and not sure when I will
> be able to spend the effort to move our system to 8.2).
> 
> When I force it via  "set enable_seqscan to off", the index scan
> takes about 0.1 msec  (as reported by explain analyze), whereas
> with the default, it chooses a seq. scan, for a total execution
> time around 10 msec!!  (yes: 100 times slower!).  The table has
> 20 thousand records, and the WHERE part of the query uses one
> field that is part of the primary key  (as in, the primary key
> is the combination of field1,field2, and the query involves a
> where field1=1 and some_other_field=2).  I don't think I'm doing
> something "wrong", and I find no reason not to expect the query
> planner to choose an index scan.

1) I'm missing a very important part - information about the settings
   in postgresql.conf, especially effective cache size, random page
   cost, etc. What hw are you using (RAM size, disk speed etc.)?

2) Another thing I'm missing is enough information about the table
   and the query itself. What is the execution plan used? Was the table
   modified / vacuumed / analyzed recently?

Without these information it's completely possible the postgresql is
using invalid values and thus generating suboptimal execution plan.
There are many cases when the sequential scan is better (faster, does
less I/O etc.) than the index scan.

For example if the table has grown and was not analyzed recently, the
postgresql may still believe it's small and thus uses the sequential
scan. Or maybe the effective case size is set improperly (too low in
this case) thus the postgresql thinks just a small fraction of data is
cached, which means a lot of scattered reads in case of the index -
that's slower than sequential reads.

There are many such cases - the common belief that index scan is always
better than the sequential scan is incorrect. But most of these cases
can be identified using explain analyze output (which is missing in your
post).

The data supplied by you are not a 'proof' the index scan is better than
sequential scan in this case, as the data might be cached due to
previous queries.

The port to 8.x might help, as some of the settings in postgresql.conf
use different default values and the stats used by the planner might be
'freshier' than those in the current database.

My recommendation:

1) send us the execution plan, that is use the EXPLAIN ANALYZE and send
   us the output

2) try to use ANALYZE on the table and run the queries again

3) review the settings in postgresql - a nice starting point is here

   http://www.powerpostgresql.com/PerfList

   (Yes, it's for Pg 8.0 but the basics are the same).

Tomas