On Fri, Jun 7, 2019 at 12:43 PM Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: > Well, "quickly" might mean within a week. If it takes that long to > fully remove a monthly partition to make that partition ID available to > some future month's partition, that seems acceptable. Blocking > DROP/DETACH for one hour is certainly not acceptable. I agree that synchronous clean-up of global indexes wouldn't make sense there, and might not be very compelling in practice. It occurs to me that we could add a code path to nbtree page splits, that considered removing dropped partition tuples to avert a page split. This would be a bit like the LP_DEAD/kill_prior_tuple thing. Technically the space used by index tuples that point to a dropped partitions wouldn't become reclaimable immediately, but it might not matter with this optimization. > If this scheme means that you can keep the partition identifiers stored > in the index to, for instance, 10 bits (allowing for 1024 partitions to > exist at any one time, including those in the process of being cleaned > up) instead of having to expand to (say) 24 because that covers a couple > of years of operation before having to recreate the index, it seems > worthwhile. I think that we should have no inherent limit on the number of partitions available at once, on general principle. Limiting the number of partitions is a design that probably has a lot of sharp edges. The nbtree heap TID column and partition number column should probably be a single varwidth column (not two separate columns), that is often no wider than 6 bytes, but can be wider when there are many partitions and/or very large partitions. That will be challenging, but it seems like the right place to solve the problem. I think that I could make that happen. Maybe this same representation could be used for all nbtree indexes, not just global nbtree indexes. -- Peter Geoghegan