Search Postgresql Archives

count distinct slow?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

As a note, I ran into the following today (doing a select distinct is fast, doing a count distinct is significantly slower?)

assume a table "issue" with a COLUMN nodename character varying(64);, 7.5M rows...

select distinct  substring(nodename from 1 for 9) from issue;

-- 5.8s

select count(distinct substring(nodename from 1 for 9)) from issue;

-- 190s


SELECT COUNT(*) FROM (SELECT DISTINCT substring(nodename from 1 for 9) FROM issue) as temp; 

-- 5.5s

I have an index on nodename's substring 1 for 9.

It struck me as odd that a count distinct would be far slower than selecting distinct rows themselves.  Apparently there are other workarounds people have come up with as well [1].  Just mentioning in case it's helpful.
Cheers!
-roger-

[1] http://stackoverflow.com/questions/11250253/postgresql-countdistinct-very-slow/14732410#14732410

explains:

explain analyze select count(distinct substring(nodename from 1 for 9)) from issue;

Aggregate  (cost=222791.77..222791.78 rows=1 width=16) (actual time=190641.069..190641.071 rows=1 loops=1)
  ->  Seq Scan on issue  (cost=0.00..185321.51 rows=7494051 width=16) (actual time=0.016..3487.694 rows=7495551 loops=1)
Total runtime: 190641.182 ms



explain analyze select distinct  substring(nodename from 1 for 9) from issue;

HashAggregate  (cost=222791.77..222846.45 rows=4375 width=16) (actual time=6276.578..6278.004 rows=6192 loops=1)
  ->  Seq Scan on issue  (cost=0.00..204056.64 rows=7494051 width=16) (actual time=0.058..4293.976 rows=7495551 loops=1)
Total runtime: 6278.564 ms


explain analyze SELECT COUNT(*) FROM (SELECT DISTINCT substring(nodename from 1 for 9) FROM issue) as temp; 

Aggregate  (cost=222901.14..222901.15 rows=1 width=0) (actual time=5195.025..5195.025 rows=1 loops=1)
  ->  HashAggregate  (cost=222791.77..222846.45 rows=4375 width=16) (actual time=5193.121..5194.454 rows=6192 loops=1)
        ->  Seq Scan on issue  (cost=0.00..204056.64 rows=7494051 width=16) (actual time=0.035..3402.834 rows=7495551 loops=1)
Total runtime: 5195.160 ms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux