Re: Why isn't an index being used when selecting a distinct value?

Keaton Adams <kadams@xxxxxxxxxxx> · Sun, 17 Feb 2008 22:22:33 -0700

The GROUP BY was the fastest method.

Thanks for the suggestions,

Keaton

On 2/15/08 3:12 PM, "Gregory Stark" <stark@xxxxxxxxxxxxxxxx> wrote:

> "Keaton Adams" <kadams@xxxxxxxxxxx> writes:
> 
>> Version: Postgres 8.1.4
>> Platform: RHEL
>> 
>> Given this scenario with the indexes in place, when I ask for the distinct
>> field1_id values, why does the optimizer choose a sequential scan instead of
>> just reading from the kda_log_fid_cre_20080123_idx index? The time it takes
>> to perform the sequential scan against 20+ million records is way too slow.
> 
> Try (temporarily) doing:
> 
> SET enable_seqscan = off;
> 
>> keaton=# explain select distinct field1_id into temp kda_temp from
>> kda_log_20080213;
> 
> If the database is right that will be even slower. Using a full index scan
> requires a lot of random access seeks, generally the larger the table the
> *more* likely a sequential scan and sort is a better approach than using an
> index.
> 
> If it's wrong and it's faster then you have to consider whether it's only
> faster because you've read the table into cache already. Will it be in cache
> in production? If so then you migth try raising effective_cache_size or
> lowering random_page_cost.
> 
> Another thing to try is using GROUP BY instead of DISTINCT. This is one case
> where the postgres optimizer doesn't handle the two equivalent cases in
> exactly the same way and there are some plans available in one method that
> aren't in the other. That's only likely to help if you have relative few
> values of field1_id but it's worth trying.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly