Search Postgresql Archives

Re: SELECT DISTINCT very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for all the feedback.

Using GROUP BY is indeed much faster (about 1 second).

Unfortunately I can't use GROUP BY, because what I'm really doing is
SELECT DISTINCT ON(unique_field) id FROM table;

I'm not familiar with the Postgres internals, but in my own DB system
that I have written, I do the skip-scanning thing, and for my system
it was a really trivial optimization to code. I know, I'm always free
to submit a patch, and hopefully someday I will, if it hasn't already
been done by then.

I can't comment on whether this skip-scan optimization is general
enough to warrant the lines of code, but I might as well explain my
use case:
Inside a GIS application, the user wants to categorize the display of
some information based on, in this case, the suburb name.
He clicks a button that says "Add All Unique Categories". This is a
very common operation in this domain.

Again, thanks for all the feedback. I'll upgrade to 8.4 soon.
Ben Harper

On Fri, Jul 10, 2009 at 2:50 AM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote:
> Greg Stark <gsstark@xxxxxxx> writes:
>> Not really. The OP doesn't say how wide the record rows are but unless
>> they're very wide it wouldn't pay to use an index for this even if you
>> didn't have to access the heap also. It's going to be faster to scan
>> the whole heap and either sort or use a hash. Currently there aren't
>> many cases where a btree with 6,000 copies of 111 distinct keys is
>> going to be useful.
>
> It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
> but as you say we haven't done it.
>
> In any case I think the real issue is that the OP is probably using a
> pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
> Hash aggregation would be a whole lot faster for these numbers, even
> if not exactly instantaneous.  He could update to 8.4, or go over to
> using GROUP BY as was recommended upthread.
>
>                        regards, tom lane
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux