Re: select distinct and index usage

Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 7 Apr 2008 08:05:05 +0200

On Apr 7, 2008, at 1:32 AM, David Wilson wrote:
I have a reasonably large table (~75m rows,~18gb) called "vals". It
includes an integer datestamp column with approximately 4000 unique
entries across the rows; there is a normal btree index on the
datestamp column. When I attempt something like "select distinct
datestamp from vals", however, explain tells me it's doing a
sequential scan:

explain select distinct datestamp from vals;
                                      QUERY PLAN
---------------------------------------------------------------------- 
----------------
 Unique  (cost=15003047.47..15380004.83 rows=4263 width=4)
   ->  Sort  (cost=15003047.47..15191526.15 rows=75391472 width=4)
         Sort Key: datestamp
         ->  Seq Scan on vals v  (cost=0.00..1531261.72  
rows=75391472 width=4)

The databases estimates seem consistent with yours, so why is it  
doing this? Could you provide an EXPLAIN ANALYSE? It shows the actual  
numbers next to the estimates, although I figure that query might  
take a while...

Pg estimates the costs quite high too. It's almost as if there isn't  
an index on that column and it has no other way then doing a  
sequential scan... Could you show us the table definition and its  
indexes? What version of Pg is this?

It may be that your index on vals.datestamp doesn't fit into memory;  
what are the relevant configuration parameters for your database?

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,47f9b995927662100729983!