Out of shared memory (locks per process) using table-inheritance style partitioning

"John Prevost" <j.prevost@xxxxxxxxx> · Thu, 30 Aug 2007 12:07:47 -0400

I've recently been developing for work a website backed by PostgreSQL,
showing information about network flows.  This data is extremely high
volume in places, and I was hard pressed to come up with a good way to
speed up data loading times until I came across the recommendation to
use table inheritance for partitioning large tables.

This data has a few strange sorts of features.  One feature is that it
generally comes in one hour chunks.  The base data that's being
analyzed comes in one hour segments.  It's possible to narrow down to
smaller segments, but you're still paying the I/O cost of processing a
whole hour, so there's not much point.

The second troublesome feature is that data isn't completely gathered
together until some time after the time period it represents has
passed.  It is desirable to give a first 90%-good summary of what has
happened as quickly as possible, and then to later replace it with a
100%-good summary.

This combination lead me to make initial data partitions on one hour
segments.  These segments generally hold datapoints for five minute
bins within the time period, with some hundreds of thousands of rows
for each five minute bin.

By using one hour partitions, the "re-loading" problem is easy.  In a
transaction, I drop the old table (foo_YYYYMMDDTHH, say) and create a
new table of the same shape.  I load the new data into it, create
indices on it, and then make it a subtable of the partitioned table.

So, what's the problem?  Well—I have twelve tables that are
partitioned by hour.  There are 24 hours in a day, there are seven
days in a week, and... you may see where I'm going here.  PostgreSQL
gets a lock on each individual table queried (in this case, every
single partition) and it doesn't take very long at all for the maximum
number of locks (~2240 by default) to be taken out, particularly when
data is being aggregated across the twelve different partitioned
tables.  (Note that the partition tables are locked even when left out
of the query by constraint exclusion.)  That's by a SINGLE
TRANSACTION, mind you, with in excess of 2000 tables locked.

For our purposes, we have some tools to automatically re-collect these
partitions.  So, we'll be using cron jobs to take the hourly
partitions for a day and turning them into a single daily partition,
and then take the daily partitions for a week or a month and combining
them further.  This is somewhat undesirable, but not really avoidable.
 Even when doing this regularly, I think we're still going to need to
increase the max_locks_per_transaction parameter.

So, my question is this: This inheritance-based partitioning model is
quite powerful, but the lock problem is serious.  Even if I had an
easy problem--a system with monthly partitions, working with two
years' worth of data, say--I would potentially start having trouble
somewhere between working with two and working with three partitioned
tables (assuming the max_locks_per_transaction of 64 was really being
used across most connections.)  It's possible to mitigated the problem
by being militant about partition management, and bumping up the
max_locks_per_transaction numbers, but... it's still a very awkward
sort of constraint.

Are there plans in the works for a new partitioning system (either
based on inheritance, or based on something else)?  If there are, has
any thought been put into how to avoid locking massive numbers of
partitions?

Thanks very much,

John Prevost.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly