Re: Partitionwise JOIN scanning all partitions, even unneeded ones

David Rowley <dgrowleyml@xxxxxxxxx> · Thu, 31 Aug 2023 08:56:41 +1200

On Thu, 31 Aug 2023 at 07:55, Dimitrios Apostolou <jimis@xxxxxxx> wrote:
> I'd appreciate help on whether it's a real issue, and if it's unknown I
> can forward this to the psql-bugs mailing list. I'd also appreciate any
> critique on the clarity of my description and on my schema and queries,
> since I'm new to postgres.

Thank you for posting this here first. We often get reports on bugs
that are not bugs, so this saves from traffic there.

> ===== Slow query:
>
> EXPLAIN (ANALYZE, VERBOSE,BUFFERS,SETTINGS)   SELECT DISTINCT workitem_n
>      FROM task_ids
>      JOIN tasks_mm_workitems USING(task_n)
>      JOIN test_runs_raw USING(workitem_n)
>      WHERE task_id = '1698813977';
>
> The EXPLAIN output here shows a parallel hash join doing seq scans on each
> and every partition. Basically the whole 10G rows table is being
> seq-scanned.

I'm sorry to say that this is not a bug, it's simply an optimisation
that we've not yet implemented.  The run-time partition pruning that
runs and causes the "(never executed)" Append subnodes in the fast
plan appears because run-time pruning during execution only works when
the Append (or MergeAppend) is parameterised by some column from above
or from the outer side of the join.  You can see that in this fragment
of your EXPLAIN output:

->  Index Only Scan using test_runs_raw__part_max6180k_pkey on
public.test_runs_raw__part_max6180k test_runs_raw_309
    Output: test_runs_raw_309.workitem_n
    Index Cond: (test_runs_raw_309.workitem_n = tasks_mm_workitems.workitem_n)

Note that tasks_mm_workitems is from the outer side of the join.

The same isn't done for Hash Joins as there is no parameterisation
with that join type.  It is technically possible to do, but it means
running the partition pruning algorithm once for each row that goes
into the hash table and taking the union of all the matching
partitions.  That's quite a bit of work, especially if you don't
manage to prune anything in the end.

Having said that, there is some work going on by Richard Guo [1] where
he aims to implement this. It is quite a tricky thing to do without
causing needless pruning work in cases where no partitions can be
pruned. If you have an interest, you can follow the thread there to
see the discussion about the difficulties with implementing this in a
way that does not cause performance regressions for queries where no
pruning was possible.

David

[1] https://commitfest.postgresql.org/44/4512/