On Monday 03 December 2007 17:32, Erik Jones wrote: > Hi, I've inherited <smirk> a database schema wherein the original > developers took the inheritance mechanism to an extreme where new > client accounts get 13 different tables of their own created for > them. We're at the many tens of thousands of tables mark (well over > 100K) and I'm going to be re-partitioning most of those 13 table > types. Basically, what I'm looking for here is advice on whether or > not my proposed method of repartitioning the data is valid or if I > should be thinking differently. > > We're an email marketing campaign creation, delivery, and response > tracking delivery service and so I'll use the opens tracking tables > for this. > > Here's the opens tables' layout: > > Column | Type | > Modifiers > ---------------------------+----------------------------- > +------------------------ > message_user_agent | character varying(255) | not null > message_open_ts | timestamp without time zone | not null > default now() > message_is_text_only | > smallint | not null default 0 > message_id | > bigint | not null > mailing_id | > bigint | > member_id | > bigint | > Indexes: > "u_mesopens_pkey" PRIMARY KEY, btree (emma_message_id) > "u_mesopens_emma_message_open_ts" btree (emma_message_open_ts) > "u_mesopens_emma_mailing_id_idx" btree (emma_mailing_id) > "u_mesopens_emma_member_id_idx" btree (emma_member_id) > > All of the other types will follow this general style, i.e. id based > primary key and indexes along with a timestamp somewhere in there > recording the original entry time. The majority of the queries we > run against these are solely id based with some being time (or id + > time) based, probably less than 10-20% for the latter. > > In order to get a ballpark idea of the current table stats I ran the > following query against pg_class for all of these type of tables > (this one is just for the opens): > > select substring(relname from '[0-9]+_(.*)$') as ttype, > count(relname) as total_tables, > sum(reltuples) as total_tuples, > max(reltuples) as max_tuples, > to_char(avg(reltuples), '999999.99') as avg_num_tuples, > to_char(stddev_pop(reltuples), '9999999.99') as std_dev_from_avg > from pg_class > where relname ~ 'opens$' > and substring(relname from '[0-9]+_(.*)$') is not null > group by ttype; > > With the following results: > > -[ RECORD 1 ]----+----------------------- > ttype | messages_history_opens > total_tables | 14027 > total_tuples | 139284528 > max_tuples | 2760599 > avg_num_tuples | 9929.84 > std_dev_from_avg | 59945.51 > > Now, for this discussion, let me also point out that we've had a > pretty steady growth rate of about 150% per year since the company > opened 5 years ago and, for the sake of argument, we'll use that here > although we really don't have any guarantees past personal faith that > we can maintaing that :) > > So, I'm looking at both a bin partiioning system or range based on > the date timestamps and both seem to have their pros and cons. For > the bin example, for this table type if I set it at 300 partitions it > will take approximately 5 years before any of the partitions reaches > the size of our current largest opens table. This is obviously very > attractive from a management perspective and has the added advantage > that I could probably expect the spread of data (volume per table) to > be pretty even over time. What would you base these bins on? If you are doing it based on the surrogate key, then your going to spread data across both dates and accounts into the bins, which seems like it would make the majority of your queries not use partitions smartly. > However, it's those times somebody wants > to ask, "Show me all my members who have opened in the last year" > that it becomes a problem as that data would most likely be spread > over a radically varying number of partitions. I could probably > solve that by making the partition based on modulo account_id. > if your bins are based on account_id that might run you the risk of filling up the bins disproprotionally. > The other option, of course, is to go with the "standard" (my quotes > based on what I've seen on this and the performance list) range based > partitioning. However, it really wouldn't make a lot of sense to > make those partitions based on the timestamp fields as the data for > one mailing could then span many of the partitions and the same > question I noted in the last paragraph would again result in hitting > potentially many (although not as many as before) partitions. the thing to think about is going one step past the CE. What is better, an index lookup based on time across account_id based partitions, or and index lookup on account_ids in time based partitions. > So, > what I'm left with here is to partition on id ranges but, in order to > maintain a somewhat stable maximum partition size, I'd have to play a > statistical guessing game wrt the size previous month's (or week's) > partitions grew to v. their defined id ranges. That's doable, but > not very attractive. To be honest, I'm not even convinced that > that's something I'd really need to worry about on a month-to-month > basis. > I dont think it is. And remember you don't have to keep your date partitions as equal intervals. > Seen from one perspective, partitioning on id ranges v. using the bin > partitioning method are kind of similar. The biggest differences > being that with the range setup the majority of the working set of > data will be in the last couple of month's partitions whereas with > the bin method it will be spread pretty evenly across all of them, > i.e. lots of pages for a couple pages constantly being worked with v. > a couple pages from a lot of tables constantly being worked with. > > Any thoughts or advice? > i dont think you'll have much luck taking the "spread data evenly throught the partitions" approach; figure out how best to segment your data into manageable chunks. HTH. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match