Re: Partitioning, Identity and Uniqueness (given pg 16 changes)

Ketan Popat <ketanpostgres@xxxxxxxxx> · Mon, 19 Feb 2024 00:08:10 -0800

1)
 Is my attempt to improve performance of insert AND (as the number of 
sources goes up, as it has) querying which is invariably by id but it is
 easy to include "and src = "" in the majority of usage (the other 
common case is to query across all sources, asking for a set is unusual)
 a reasonable approach?

2)
 It would be nice to be able to specify the id as pk on the table being 
partitioned (as it was in the non-partitioned definition of the table) 
once to document and enforce that the partitions simply inherit the id 
pk. This would seem only to need the "partition by" validation to allow a
 column not mentioned in partition by clause to be defined as pk or 
unique if and only if the pk/unique column is an identity column. Not a 
big deal but is this practical/valid?

One of the potential solutions could be to include src into your partitioning key as follows. 

 create table tpart (id int generated always as identity not null, src int, data varchar, primary key (id, src)) partition by hash(src) ;
 create table tpart_p001 partition of ttest_part for values with (modulus 4, remainder 0);
...
 create table tpart_p032 partition of ttest_part for values with (modulus 32, remainder 31);

App side changes required:
Ensure all queries come with src as a filter, otherwise it could backfire performance due to partition pruning overhead.
Here is how it could help:
"generated always as identity" for id would ensure unique id and src can be used for hash partitioning.
insert performance may improve if you are seeing resource contention related wait events during insert.
it would greatly help with the access pattern querying time series for single src

Thanks,
Ketan 

On Sun, Feb 18, 2024 at 8:23 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote:
On Mon, 19 Feb 2024 at 16:32, Darryl Green <darryl.green@xxxxxxxxx> wrote:

> 2) It would be nice to be able to specify the id as pk on the table being partitioned (as it was in the non-partitioned definition of the table) once to document and enforce that the partitions simply inherit the id pk. This would seem only to need the "partition by" validation to allow a column not mentioned in partition by clause to be defined as pk or unique if and only if the pk/unique column is an identity column. Not a big deal but is this practical/valid?

Unfortunately, it's not as easy as you think. It's not a matter of

dropping the check that requires all PRIMARY KEY columns are present

in the PARTITION BY clause.  For this to work a *single* index (i.e.

non-partitioned index) would have to index all partitions.  Otherwise,

how would Postgres ensure that the value being inserted doesn't exist

in some other partition?

The problem with a single index is that it kinda defeats the purpose

of partitioning, i.e., "my table is large and I want to split it up".

Operations such as DETACH PARTITION would have to become more than

just a metadata operation when you consider having to trawl through

the index and remove all records belonging to a single partition.

It may be possible to still have it work by doing a speculative record

in the index for the target table then go and check all of the other

indexes before marking the speculative entry as valid. I think it

would be very tricky to make it work well, however.  Imagine how

expensive ATTACH PARTITION would be!  There are probably other race

conditions I've not thought about too. Likely, we'd get more

complaints about this being a terrible feature than we do due to the

fact that it's unsupported.

David