Slow Count-Distinct Query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday, April 4, 2014, Varadharajan Mukundan <srinathsmn@xxxxxxxxx> wrote:
Hi Jeff,

It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query).  He then made a bunch of changes, none of which worked.  He then re-wrote the query into a form for which the planner does a better job on.  What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?


I was also pairing up with Chris (original emailer) on this issue and in order to reproduce it, i've a created a two column table with following schema with 1.9 Million dummy rows:

=================================

a=# \d participants
                                 Table "public.participants"
 Column |          Type          |                         Modifiers
--------+------------------------+-----------------------------------------------------------
 id     | integer                | not null default nextval('participants_id_seq'::regclass)
 email  | character varying(255) |

I've tried out various scenarios on this table and recorded it as a transcript below: (Please read it as we read a shell script from top to bottom continuously to get the whole idea):

; Create table and Insert 1.9 Million rows

a=# create table participants(id serial, email varchar(255));
NOTICE:  CREATE TABLE will create implicit sequence "participants_id_seq" for serial column "participants.id"
CREATE TABLE
a=# \d participants
                                 Table "public.participants"
 Column |          Type          |                         Modifiers
--------+------------------------+-----------------------------------------------------------
 id     | integer                | not null default nextval('participants_id_seq'::regclass)
 email  | character varying(255) |

a=# copy participants from '/tmp/a.csv' with csv;
COPY 1999935

Thanks for the detailed response.  I don't have access to your /tmp/a.csv of course, I so I just used this:

insert into participants (email) select md5(floor(random()*1000000)::text) from generate_series(1,2000000);

This gives each email showing up about twice.

 

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37874.94..37874.96 rows=1 width=0) (actual time=1549.258..1549.258 rows=1 loops=1)
   ->  HashAggregate  (cost=37763.19..37812.86 rows=4967 width=16) (actual time=1168.114..1461.672 rows=1000000 loops=1)
         ->  Seq Scan on participants  (cost=0.00..37738.19 rows=10000 width=16) (actual time=0.045..411.267 rows=1999935 loops=1)
               Filter: ((email)::text = (email)::text)
 Total runtime: 1567.586 ms
(5 rows)

What you have done here is trick the planner into thinking your query will be 200 times smaller than it actually is, and thus the hash table will be 200 times smaller than it actually is and therefore will fit in allowed memory.  This is effective at getting the more efficient hash agg.  But it no more safe than just giving it explicit permission to use that much memory for this query by increasing work_mem by 200 fold.

I am kind of surprised that the planner is so easily fooled by that.
 


; Creation of idx on email field

a=# create index email_idx on participants(email);
CREATE INDEX
 

a=#  EXPLAIN (ANALYZE)  select count(1) from (select email from participants group by email) x;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48622.59..48622.60 rows=1 width=0) (actual time=1242.718..1242.718 rows=1 loops=1)
   ->  HashAggregate  (cost=26273.09..36206.20 rows=993311 width=16) (actual time=855.215..1150.781 rows=1000000 loops=1)
         ->  Seq Scan on participants  (cost=0.00..21273.25 rows=1999935 width=16) (actual time=0.058..217.105 rows=1999935 loops=1)
 Total runtime: 1264.234 ms
(4 rows)


I can't reproduce this at all, except by increasing work_mem.   The hash table needed for this is no smaller than the hash table needed before the index was built.  Did you increase work_mem before the above plan?

Instead what I get is the index only scan (to provide order) feeding into a Group.
 

a=# drop index email_idx;
DROP INDEX

; Creation of partial index on email 

a=# create index email_idx on participants(email) where email=email;
CREATE INDEX

a=#  EXPLAIN (ANALYZE)  select count(distinct email) from participants where email=email;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12949.26..12949.27 rows=1 width=16) (actual time=3465.472..3465.473 rows=1 loops=1)
   ->  Bitmap Heap Scan on participants  (cost=249.14..12924.26 rows=10000 width=16) (actual time=161.776..421.223 rows=1999935 loops=1)
         Recheck Cond: ((email)::text = (email)::text)
         ->  Bitmap Index Scan on email_idx  (cost=0.00..246.64 rows=10000 width=0) (actual time=159.446..159.446 rows=1999935 loops=1)
 Total runtime: 3466.867 ms

I also cannot get this.

 
(5 rows)

a=# set enable_bitmapscan = false;
a=# set seq_page_cost = 0.1;
SET
a=# set random_page_cost = 0.2;
SET


These don't seem to accomplish anything for you.  They switch the slow form of the query between two plans with about the same speed.
  

a=# explain analyze select count(distinct email) from participants where email=email;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1517.16..1517.17 rows=1 width=16) (actual time=3504.310..3504.310 rows=1 loops=1)
   ->  Index Only Scan using email_idx on participants  (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.101..795.595 rows=1999935 loops=1)
         Heap Fetches: 1999935
 Total runtime: 3504.358 ms
(4 rows)

a=# explain analyze select count(1) from (select email from participants where email=email group by email) x;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1579.25..1579.26 rows=1 width=0) (actual time=1193.912..1193.912 rows=1 loops=1)
   ->  Group  (cost=0.00..1517.16 rows=4967 width=16) (actual time=0.096..1101.828 rows=1000000 loops=1)
         ->  Index Only Scan using email_idx on participants  (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.091..719.281 rows=1999935 loops=1)
               Heap Fetches: 1999935
 Total runtime: 1193.963 ms
(5 rows)

; Oh yes, cluster the rows by email

a=# create index email_idx_2 on participants(email)
a-# ;
CREATE INDEX

a=# cluster participants using email_idx_2;
CLUSTER


a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)
   ->  Group  (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)
         ->  Index Only Scan using email_idx on participants  (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)
               Heap Fetches: 1999935
 Total runtime: 942.159 ms
(5 rows)

Once I cluster and vacuum and analyze the table, I get this plan without having the partial index (just the regular index), without the "where email=email" and without disabling the bitmap scan or changing the page_cost parameters.

I usually get this plan without the cluster, to.  I think it depends on the luck of the sampling in the autoanalyze.

Cheers,

Jeff

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux