On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142@xxxxxxxxx> wrote:
But when I do that:So the performance difference is not very large.test=> select distinct col_int from t1 group by col_int;
Time: 1177.936 ms
test=> select count(distinct col_int) from t1;
count
-------
1025
(1 row)
Time: 7367.476 ms
count(distinct ...) always sorts, rather than using a hash, to do its work. I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet. It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.
Cheers,
Jeff