On 10/13/06, Ron Johnson <ron.l.johnson@xxxxxxx> wrote:
Hi, I've gotten preliminary approval to buy a server and load a *lot* of data into it. One table will eventually have 4.5Bn 330 bytes rows, the other 9Bn 300 byte rows. Other will "only" have a billion rows. They are easily partitioned by yyyymm, which we call FISCAL_PERIOD. (In fact, the app creates the integer FISCAL_PERIOD by extracting year and month from transaction date: YEAR*100+MONTH.)
wow. if I do my math correctly, this will put you in the multi terabyte range. i'm sure the postgresql community (talk to Josh Berkus) would love to hear about your experiences in this project. anyways, regarding built in/manual partitioning, I have to admit I am not a big fan of the built in table partitioning. It was kind of fiddly to set up, and constraint exclusion only worked on select queries, which was the real deal killer for me. however, the latter issue this has been addressed in 8.2 (http://developer.postgresql.org/pgdocs/postgres/release-8-2.html). With built in, foreign key behavior is a little bit different and perhaps problematic, which you you should consider if you plan to enforce constraints via RI. however, you can always drop down to a trigger calling a dynamic pl/sql function which is almost as good (sometimes better) to enforce constraints. another issue is that sometimes the plans generated on non trivial queries involving joins to partitioned tables were not what i would have expected, resorting to seq scans or not using constraint_exclusion conditions in certain cases obviously, this is a moving target and may improve in later versions of postgresql, so test your sql carefully. one thing that is interesting is that if your data divisions is very strictly regimented so that most of your operations work on exactly one schema, you can put your partions in separate schemas. why do this? well your table names are uniform for starters. if you are into pl/pgsql functions you can then keep one function/operation which operates over all your partitions without excessive use of dynamic sql (which is not terrible, but I'd prefer not to use it if possible.). so long as you have a pretty good idea of when function plans are generated, you can enter into your 'namespace' by manipulating search_path and go to work. with a database of your size you really have to work out some test data and try both approaches. what works is going to be a combination of pracical factors and personal style...and great feedback for the community should you be persuaded to give regular updates on your progress. as with all partitioning strategies, keep an eye out for worst case behavior. merlin