Re: Strange query behaviour between 9.4 and 12beta1

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Wed, 5 Jun 2019 01:01:18 +0200

On Tue, Jun 04, 2019 at 05:34:07PM +0300, Vasilis Ventirozos wrote:
Hello everyone,
I started comparing performance between postgres 9.4 and 12beta1 more specifically comparing the new (materialized) CTE.

Are you saying the CTE is specified as MATERIALIZED in the query on 12?
Because I don't see it in the explain plan (it's mentioned in the 9.4
plan, though).

The statements i use are application statements that i have little control over,
Hardware is identical as both clusters are running on the same server, on the same disks, with the same data.
Also, cluster settings are almost identical and both clusters have been analyzed.
In all my tests 12 is faster , sometimes much faster, apart from one single query that takes ~12 seconds on 9.4 and nearly 300 seconds on 12.

Plans for both :
Plan for 12 <https://explain.depesz.com/s/wRXO>
plan for 9.4 <https://explain.depesz.com/s/njtH>

The plans are obfuscated , apologies for that but what stands out is the following :

Meh.

Hash Left Join <http://www.depesz.com/2013/05/09/explaining-the-unexplainable-part-3/#join-modifiers> (cost=200,673.150..203,301.940 rows=153,121 width=64) (actual time=1,485.883..284,536.440 rows=467,123 loops=1)
Hash Cond: (lima_sierra(six_lima_november2.victor_romeo, 1) = foxtrot_hotel.victor_romeo)
Join Filter: (whiskey_uniform1.six_sierra = foxtrot_hotel.uniform_juliet)
Rows Removed by Join Filter: 4549925366

You have two equality conditions for the join. The first one is used to
match rows by the hash join itself - it's used to compute the hash
value and lookups. But there may be multiple rows that match on either
side, generating additional "virtual rows". Those are then removed by
the second condition.

Consider for example simple cross-join on this table:

  a  |  b
 -------------
  1  |  a
  1  |  b
  2  |  a
  2  |  b

and the query is

 SELECT * FROM t t1 JOIN t t2 ON (t1.a = t2.a AND t1.b = t2.b)

Now, in the first phase, the hash join might only do (t1.a = t2.a),
which will generate 8 rows

 a | t1.b | t2.b
 ----------------
 1 |    a |    a
 1 |    a |    b
 1 |    b |    a
 1 |    b |    b
 2 |    a |    a
 2 |    a |    b
 2 |    b |    a
 2 |    b |    b

And then it will apply the second condition (t1.b = t2.b) as a "filter"
which will remove some of the rows. In your case the first step
generates 4.5B rows the second step discards.

I'm not sure why we don't use both conditions as a hash condition.
Perhaps it's a data type that does not support hashing, and on 9.4 that
does not matter because we end up using merge join.

I really can't understand what these 4.5bil rows have been removed from, there is nothing suggesting that this dataset was ever created (eg. temp)
and these numbers definitely don't match what i was expecting, which is more or less what i'm seeing in 9.4 plan.

Obviously i've tested this more than once and this behaviour consists.

Best Regards,
Vasilis Ventirozos

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services