Re: Merge David and Goliath tables efficiently

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx> · Mon, 19 Jun 2023 13:53:50 +0200

On 6/19/23 09:46, Alvaro Herrera wrote:
> I came here to talk about partitionwise join, but then noticed you have
> already thought of that:
> 
> On 2023-Jun-18, nicolas paris wrote:
> 
>> Note that both plan acome from the same partitioned by hash table with
>> 100 parts, with a unique index on the list_id + hash_key. For strategy
>> 2.1, I turned on enable_partitionwise_join, since david table has the
>> same partitioning scheme as goliath including unique indexe. In both
>> case the query is:
> 
> Hmm, I suppose the reason partitionwise join isn't having any effect is
> that the presence of WHEN NOT MATCHED clauses force an outer join, which
> probably disarms partitionwise joining, since each join pair would
> require to match for nulls, so there would be two matching partitions at
> the other end.  A quick test for this hypothesis might be to try the
> MERGE without the WHEN NOT MATCHED clauses and see if partitionwise join
> works better.
> 
> Maybe Tom L's new outer-join infrastructure in 16 allows to improve on
> this, not sure.
> 

Not sure why would that disarm partitionwise join - attached is a simple
reproducer, generating two tables, loading 10000000 and 10000 rows into
them, and then doing explain on a simple merge.

IMHO the thing that breaks it is the ORDER BY in the merge, which likely
acts as an optimization fence and prevents all sorts of smart things
including the partitionwise join. I'd bet that if Nicolas replaces

  MERGE INTO "goliath" ca
  USING (SELECT * FROM "david" ORDER BY "list_id") AS t
  ..

with

  MERGE INTO "goliath" ca
  USING "david" AS t
  ...

it'll start doing the working much better.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment:
repro.sql

Description: application/sql