> good normalization skills are really important for large databases, > along with materialization strategies for 'denormalized sets'. Good points- thanks. I'm especially curious what others have done for the materialization. The matview project on gborg appears dead, and I've only found a smattering of references on google. My guess is, you roll your own for optimal performance... > regarding the number of rows, there is no limit to how much pg can > handle per se, just some practical limitations, especially vacuum and > reindex times. these are important because they are required to keep > a handle on mvcc bloat and its very nice to be able to vaccum bits of > your database at a time. I was hoping for some actual numbers on "practical". Hardware isn't too much of an issue (within reason- we're not talking an amazon or google here... the SunFire X4500 looks interesting... )- if a customer wants to store that much data, and pay for it, we'll figure out how to do it. I'd just rather not have to re-design the database. Say the requirement is to keep 12 months of data accessible, each "scan" produces 100M records, and I run one per month. What happens if the customer wants to run it once a week? I was more trying to figure out at what point (ballpark) I'm going to have to look into archive tables and things of that nature (or at Bizgres/MPP). It's easier for us to add more/bigger hardware, but not so easy to redesign/add history tables... > > just another fyi, if you have a really big database, you can forget > about doing pg_dump for backups (unless you really don't care about > being x day or days behind)...you simply have to due some type of > replication/failover strategy. i would start with pitr. > > merlin I was originally thinking replication, but I did notice some nice pitr features in 8.x - I'll have to look into that some more. Thanks for the pointers though... - Bucky