Re: Very long query planning times for database with lots of partitions

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Tue, 22 Jan 2019 08:02:05 -0600

On Tue, Jan 22, 2019 at 02:44:29PM +0100, Mickael van der Beek wrote:
> Hey everyone,
> 
> I have a PostgreSQL 10 database that contains two tables which both have
> two levels of partitioning (by list and using a single value). Meaning that
> a partitioned table gets repartitioned again.
> 
> The data in my use case is stored on 5K to 10K partitioned tables (children
> and grand-children of the two tables mentioned above) depending on usage
> levels.
> 
> Three indexes are set on the grand-child partition. The partitioning
> columns are not covered by them.
> (I don't believe that it is needed to index partition columns no?)
> 
> With this setup, I experience queries that have very slow planning times
> but fast execution times.
> Even for simple queries where only a couple partitions are searched on and
> the partition values are hard-coded.
> 
> Researching the issue, I thought that the linear search in use by
> PostgreSQL 10 to find the partition table metadata was the cause.
> 
> cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/
> 
> So I decided to try ou PostgreSQL 11 which included the two aforementioned
> fixes:
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Those reduce the CPU time needed, but that's not the most significant issue.

For postgres up through 11, including for relkind=p, planning requires 1)
stat()ing every 1GB file for every partition, even those partitions which are
eventually excluded by constraints or partition bounds ; AND, 2) open()ing
every index on every partition, even if it's excluded later.

Postgres 12 is expected to resolve this and allow "many" (perhaps 10k) of
partitions: https://commitfest.postgresql.org/20/1778/

I think postgres through 11 would consider 1000 partitions to be "too many".

You *might* be able to mitigate the high cost of stat()ing tables by ensuring
that the table metadata stays in OS cache, by running something like:
 find /var/lib/pgsql /tablespace -ls

You *might* be able to mitigate the high cost of open()ing the indices by
keeping their first page in cache (preferably postgres buffer cache)..either by
running a cronjob to run explain, or perhaps something like pg_prewarm on the
indices.  (I did something like this for our largest customers to improve
performance as a stopgap).

Justin