Re: Query became very slow after 9.6 -> 10 upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We tried to apply the patch on 10.1 source, but something is wrong it seems:

patch -p1 < ../1.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/optimizer/plan/analyzejoins.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/utils/adt/selfuncs.c
Hunk #1 succeeded at 3270 (offset -91 lines).
Hunk #2 succeeded at 3304 (offset -91 lines).
Hunk #3 succeeded at 3313 (offset -91 lines).
Hunk #4 succeeded at 3393 (offset -91 lines).
patch unexpectedly ends in middle of line
Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines).


Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-23 2:07 GMT+03:00 Tom Lane <tgl@xxxxxxxxxxxxx>:
Dmitry Shalashov <skaurus@xxxxxxxxx> writes:
> Turns out we had not 9.6 but 9.5.

I'd managed to reproduce the weird planner behavior locally in the
regression database:

regression=# create table foo (f1 int[], f2 int);
CREATE TABLE
regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=30.85..80.50 rows=6 width=244)
   ->  HashAggregate  (cost=30.57..30.63 rows=6 width=4)
         Group Key: (unnest(foo.f1))
         ->  HashAggregate  (cost=30.42..30.49 rows=6 width=4)
               Group Key: unnest(foo.f1)
               ->  ProjectSet  (cost=0.00..28.92 rows=600 width=4)
                     ->  Seq Scan on foo  (cost=0.00..25.88 rows=6 width=32)
                           Filter: (f2 = 1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..8.30 rows=1 width=244)
         Index Cond: (unique2 = (unnest(foo.f1)))
(10 rows)

Digging into it, the reason for the duplicate HashAggregate step was that
query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
on the argument that it wasn't worth extra work to handle that case.
Thinking a bit harder, it seems to me that the correct analysis is:
1. If we are proving distinctness on the grounds of a DISTINCT clause,
then it doesn't matter whether there are any SRFs, because DISTINCT
removes duplicates after tlist SRF expansion.
2. But tlist SRFs break the ability to prove distinctness on the grounds
of GROUP BY, unless all of them are within grouping columns.
It still seems like detecting the second case is harder than it's worth,
but we can trivially handle the first case, with little more than some
code rearrangement.

The other problem is that the output rowcount of the sub-select (ie, of
the HashAggregate) is being estimated as though the SRF weren't there.
This turns out to be because estimate_num_groups() doesn't consider the
possibility of SRFs in the grouping columns.  It never has, but in 9.6 and
before the problem was masked by the fact that grouping_planner scaled up
the result rowcount by tlist_returns_set_rows() *after* performing
grouping.  Now we're effectively doing that in the other order, which is
more correct, but that means estimate_num_groups() has to apply some sort
of adjustment.  I suggest that it just multiply its old estimate by the
maximum of the SRF expansion counts.  That's likely to be an overestimate,
but it's really hard to do better without specific knowledge of the
individual SRF's behavior.

In short, I propose the attached fixes.  I've checked this and it seems
to fix Dmitry's original problem according to the test case he sent
off-list.

                        regards, tom lane



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux