Re: Feature proposal: immutable/sealed partitions (and maybe tables, too)

Levi Aul <levi@xxxxxxxxxxxxxx> · Tue, 6 Sep 2022 18:33:18 -0700

We're using Postgres 14.5. I meant partition pruning.
To be clear, this isn't a bug report. There is no bug—everything is working exactly as it should. The partitions are not being pruned because the workload consists of OLAP aggregations that fetch a small number of rows spread across all partitions in the set, relying for speed on an index that isn't prefixed with the partitioning key (nor can it be.)

I'll try to avoid the particulars of the business domain (it gets clumsy because there's a lot of jargon collisions, with tables named things like "transactions"), but you can think of it abstractly as follows: we have a table holding a CQRS event-stream; we are trying to discover all events related to a particular person, where events "related to a person" are related either directly (by an indexed field of the event) or indirectly (by a foreign-key reference to the event from a row in a second partitioned table — let's call it "event attributes" — where the person is an indexed field of the event-attribute.) We construct/linearize/uniquify a time-ordered rowset of all such related events; and then we reduce/aggregate some value fields from those events. This query requires index scans of all N partitions of events + all N partitions of event_attributes.

This workload is performing exactly how you'd expect it to perform (i.e. badly) given Postgres's current operational pragmatics around partition locking. The only way it could possibly perform better, is if Postgres didn't have to acquire N shared-access locks in order to index-scan N partitions. And the only way that could work, is if Postgres could make some assumption about the locking behavior of the partitions. Thus my feature proposal.

To be clear, I'm more interested in discussing the feature proposal than in solving the immediate problem. The immediate problem has an obvious, if painful, solution: merging the historical partitions into a single huge historical partition per table. The feature proposal, meanwhile, has the potential to solve many of our current business-level problems if the "further optimizations" I mentioned can be made.

Also, to be clear, I'm interested in implementing the feature I've proposed myself. I've read the relevant parts of the Postgres codebase and feel confident I can make the required changes. I would just like to have a design discussion around the shape the feature should take, with Postgres stakeholders, before I go to all that effort to build something they might not accept upstream.

On Tue, Sep 6, 2022 at 5:53 PM David Rowley <dgrowleyml@xxxxxxxxx> wrote:
On Wed, 7 Sept 2022 at 07:40, Levi Aul <levi@xxxxxxxxxxxxxx> wrote:

> In other words, our workload is inherently one that acquires "way too many locks." Our largest performance bottleneck, according to pg_wait_sampling, is the LockManager itself. Despite most of our queries spending only milliseconds actually executing, they often spend seconds during planning waiting to acquire hundreds of access-shared locks.

It would be good to have a better understanding of the problem you're

facing.  There have been many changes to table partitioning since

PostgreSQL 10 and many of those changes affect the number of

partitions which are locked for certain classes of queries.

It would be good to know the following:

1. Which version of PostgreSQL are you having this problem with, and;

2. Example of queries you're having this problem with.

If you share that information we may be able to inform you about

features/performance improvements in newer versions which help with

the problem you're facing.

You mention "constraint-exclusion", that's no longer how we perform

partition pruning and hasn't been since (if I remember correctly)

PostgreSQL 11. Perhaps you're using PG10?

David

-- 

Levi Aul, CTO, Covalent