Re: multicolumn partitioning help

David Rowley <dgrowleyml@xxxxxxxxx> · Thu, 16 Mar 2023 10:20:23 +1300

On Thu, 16 Mar 2023 at 00:47, James Robertson <james@xxxxxxxxxxxxxxx> wrote:
> or do we get?
>
> TopLevelTable
> |
> |----> worker-thread 1 (default catch)
> ........|----> worker thread 2 -> sub-table 1.1
> ........|----> worker thread 3 -> sub-table 1.2
> ........|----> worker thread 4 -> sub-table 1.n
> |
> |----> worker-thread 5 (default catch)
> ........|----> worker thread 6 -> sub-table 2.1
> ........|----> worker thread 7 -> sub-table 2.2
> ........|----> worker thread 8 -> sub-table 2.n

The planner generally flattens out the scans to each partition into a
single Append or MergeAppend node.  Append nodes can be parallelised.
Assuming there's no reason that a particular partition can't support
it, the parallel workers can be distributed to assist without
restriction to which partition they help with. Multiple workers can
even help with a single partition. Workers can then move over to
helping with other partitions when they're done with the partition
they've been working on. I believe some other databases do or did at
least restrict parallelism to 1 worker per partition (which perhaps is
why you raised this).  There's no such restriction with PostgreSQL.

> Summary:
> 1) if we create nested partitions, do we create performance issues:

If you create too many partitions, it can create performance issues.
You should look at the partitioning best practices section of the
documents for details about that. I recommend a careful read of those.

> 2) if nested partitions are the solutions, what is the point of multi-column partitioning?

There are many reasons.  If you have multiple levels of partitioning,
then the partition pruning done during planning is going to have more
work to do as it'll be executed once, then once again for each
partitioned table remaining after running it for the first level.
Also, it seems much easier to PARTITION BY HASH(a,b) than to first do
HASH(a) then another level to HASH(b).  However, there may be
advantages to having multiple levels here as the planner would still
be able to prune partitions if the WHERE clause didn't contain any
quals like "b = <value>".  The key take away here is that they're
different, so we support both.

> wish list) wouldn't it be neat if we can do mult-mode multi-column? like PARTITION BY RANGE (EXTRACT(YEAR FROM dob)) LIST (SUBSTRING(hash, 1, 1));

Effectively, multi-level partitioning gives you that, It's just the
DDL is different from how you wrote it.

David