Re: Query became very slow after 9.6 -> 10 upgrade

Dmitry Shalashov <skaurus@xxxxxxxxx> · Thu, 23 Nov 2017 00:34:47 +0300

Turns out we had not 9.6 but 9.5.
And query plan from 9.5 is:

 Sort  (cost=319008.18..319008.19 rows=1 width=556) (actual time=0.028..0.028 rows=0 loops=1)
   Sort Key: (sum(st.shows)) DESC
   Sort Method: quicksort  Memory: 25kB
   CTE a
     ->  Index Scan using adroom_active_idx on adroom  (cost=0.13..5.21 rows=1 width=584) (actual time=0.004..0.004 rows=0 loops=1)
           Index Cond: ((now() >= start_ts) AND (now() <= stop_ts))
           Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text)
   CTE b
     ->  HashAggregate  (cost=1.27..1.77 rows=100 width=68) (actual time=0.005..0.005 rows=0 loops=1)
           Group Key: a.provider, a.id, unnest(a.domain_ids)
           ->  CTE Scan on a  (cost=0.00..0.52 rows=100 width=68) (actual time=0.004..0.004 rows=0 loops=1)
   ->  HashAggregate  (cost=319001.17..319001.18 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1)
         Group Key: b.provider, d.domain
         ->  Hash Join  (cost=16.55..319001.16 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1)
               Hash Cond: ((st.adroom_id = b.id) AND (st.domain_id = b.domain_id))
               ->  Hash Join  (cost=13.05..318633.29 rows=48581 width=536) (never executed)
                     Hash Cond: (st.domain_id = d.id)
                     ->  Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat st  (cost=0.58..313307.30 rows=1287388 width=16) (never executed)
                           Index Cond: ((day >= date_trunc('day'::text, (now() - '7 days'::interval))) AND (day <= date_trunc('day'::text, now())))
                     ->  Hash  (cost=11.10..11.10 rows=110 width=520) (never executed)
                           ->  Seq Scan on domains d  (cost=0.00..11.10 rows=110 width=520) (never executed)
               ->  Hash  (cost=2.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                     ->  CTE Scan on b  (cost=0.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1)
 Planning time: 6.641 ms
 Execution time: 0.203 ms

Also I prepared test case for Tom and sent it to him.

Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-22 18:19 GMT+03:00 Tom Lane <tgl@xxxxxxxxxxxxx>:
Dmitry Shalashov <skaurus@xxxxxxxxx> writes:

> BUT if I'll add to 3rd query one additional condition, which is basically

> 2nd query, it will ran same 12 minutes:

> SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day

> between date_trunc('day', current_timestamp - interval '1 week') and

> date_trunc('day', current_timestamp) AND domain_id IN (select distinct

> unnest(domain_ids) FROM (select title, id, groups->0->>'provider' provider,

> domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' and

> not is_paused and current_timestamp between start_ts and stop_ts) t1)

> Plan of last query:

>  Nested Loop  (cost=88.63..25617.31 rows=491 width=16) (actual

> time=3.512..733248.271 rows=1442797 loops=1)

>    ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4) (actual

> time=3.380..13.561 rows=3043 loops=1)

>          Group Key: (unnest(adroom.domain_ids))

>          ->  HashAggregate  (cost=88.03..88.04 rows=1 width=4) (actual

> time=2.199..2.607 rows=3043 loops=1)

>                Group Key: unnest(adroom.domain_ids)

>                ->  ProjectSet  (cost=0.28..87.78 rows=100 width=4) (actual

> time=0.701..1.339 rows=3173 loops=1)

Hm, seems like the problem is that that lower HashAggregate is estimated

as having only one row out, which is way off and doesn't sound like a

particularly bright default estimate anyway.  (And then we're doing an

additional HashAggregate on top of that, which is useless --- implies

that something isn't realizing that the output of the SELECT DISTINCT

is already distinct.)

I'm suspicious that this is breakage from the work that was done on

targetlist SRFs in v10, but that's just a guess at this point.

Trying simple test queries involving WHERE x IN (SELECT DISTINCT

unnest(foo) FROM ...), I do not see a behavior like this, so there is some

not-very-obvious contributing factor in your situation.  Can you put

together a self-contained test case that produces a bogus one-row

estimate?  Extra points if it produces duplicate HashAgg steps.

                        regards, tom lane