Re: Partitioning vs. View of a UNION ALL

"Merlin Moncure" <mmoncure@xxxxxxxxx> · Fri, 13 Oct 2006 09:33:38 -0400

On 10/13/06, Ron Johnson <ron.l.johnson@xxxxxxx> wrote:
Hi,

I've gotten preliminary approval to buy a server and load a *lot* of
data into it.  One table will eventually have 4.5Bn 330 bytes rows,
the other 9Bn 300 byte rows.  Other will "only" have a billion rows.
 They are easily partitioned by yyyymm, which we call FISCAL_PERIOD.
 (In fact, the app creates the integer FISCAL_PERIOD by extracting
year and month from transaction date: YEAR*100+MONTH.)

wow. if I do my math correctly, this will put you in the multi
terabyte range.  i'm sure the postgresql community (talk to Josh
Berkus) would love to hear about your experiences in this project.

anyways, regarding built in/manual partitioning, I have to admit I am
not a big fan of the built in table partitioning.  It was kind of
fiddly to set up, and constraint exclusion only worked on select
queries, which was the real deal killer for me.  however, the latter
issue this has been addressed in 8.2
(http://developer.postgresql.org/pgdocs/postgres/release-8-2.html).
With built in, foreign key behavior is a little bit different and
perhaps problematic, which you you should consider if you plan to
enforce constraints via RI. however, you can always drop down to a
trigger calling a dynamic pl/sql function which is almost as good
(sometimes better) to enforce constraints.  another issue is that
sometimes the plans generated on non trivial queries involving joins
to partitioned tables were not what i would have expected, resorting
to seq scans or not using constraint_exclusion conditions in certain
cases  obviously, this is a moving target and may improve in later
versions of postgresql, so test your sql carefully.

one thing that is interesting is that if your data divisions is very
strictly regimented so that most of your operations work on exactly
one schema, you can put your partions in separate schemas.  why do
this? well your table names are uniform for starters.  if you are into
pl/pgsql functions you can then keep one function/operation which
operates over all your partitions without excessive use of dynamic sql
(which is not terrible, but I'd prefer not to use it if possible.).
so long as you have a pretty good idea of when function plans are
generated, you can enter into your 'namespace' by manipulating
search_path and go to work.

with a database of your size you really have to work out some test
data and try both approaches. what works is going to be a combination
of pracical factors and personal style...and great feedback for the
community should you be persuaded to give regular updates on your
progress.

as with all partitioning strategies, keep an eye out for worst case behavior.

merlin