Re: Table partition with primary key in 11.3

Peter Geoghegan <pg@xxxxxxx> · Fri, 7 Jun 2019 12:56:18 -0700

On Fri, Jun 7, 2019 at 12:43 PM Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote:
> Well, "quickly" might mean within a week.  If it takes that long to
> fully remove a monthly partition to make that partition ID available to
> some future month's partition, that seems acceptable.  Blocking
> DROP/DETACH for one hour is certainly not acceptable.

I agree that synchronous clean-up of global indexes wouldn't make
sense there, and might not be very compelling in practice.

It occurs to me that we could add a code path to nbtree page splits,
that considered removing dropped partition tuples to avert a page
split. This would be a bit like the LP_DEAD/kill_prior_tuple thing.
Technically the space used by index tuples that point to a dropped
partitions wouldn't become reclaimable immediately, but it might not
matter with this optimization.

> If this scheme means that you can keep the partition identifiers stored
> in the index to, for instance, 10 bits (allowing for 1024 partitions to
> exist at any one time, including those in the process of being cleaned
> up) instead of having to expand to (say) 24 because that covers a couple
> of years of operation before having to recreate the index, it seems
> worthwhile.

I think that we should have no inherent limit on the number of
partitions available at once, on general principle. Limiting the
number of partitions is a design that probably has a lot of sharp
edges.

The nbtree heap TID column and partition number column should probably
be a single varwidth column (not two separate columns), that is often
no wider than 6 bytes, but can be wider when there are many partitions
and/or very large partitions. That will be challenging, but it seems
like the right place to solve the problem. I think that I could make
that happen. Maybe this same representation could be used for all
nbtree indexes, not just global nbtree indexes.

-- 
Peter Geoghegan