On Fri, Aug 15, 2008 at 11:42 PM, Amber <guxiaobo1982@xxxxxxxxxxx> wrote: > Dear all: > We are currently considering using PostgreSQL to host a read only warehouse, we would like to get some experiences, best practices and performance metrics from the user community, following is the question list: DB is ~650m rows across 10 tables and is currently around 160gb. Running on Ubuntu, mostly because this db started out as a toy and it was easy. It's done well enough thus far that it isn't worth the hassle to replace it with anything else. Currently only using Raid 0; the database can be regenerated from scratch if necessary so we don't have to worry overmuch about disk failures. Machine is a quad-core Xeon 2.5 with 4g of RAM. Our access pattern is a little odd; about half the database is wipe and regenerated at approximately 1-2 month intervals (the regeneration takes about 2 weeks); in between there's a nightly computation run that creates a small amount of new data in two of the tables. Both the regeneration and the addition of the new data depends very heavily on many, many several table joins that generally involve about 50% of the database at a time. We've been fairly pleased with the performance overall, though it's taken some tweaking to get individual operations to perform adequately. I can't speak to pure load operations; all of our bulk-load style ops are 4k-row COPY commands interspersed among a lot of big, complicated aggregate queries- not exactly ideal from a cache perspective. Concurrent readers are anywhere from 1-8, and we're not in a cluster. Sequential transfer rate is usually a touch over 100mb/sec; we don't have a lot of disks on this machine (though that may change.... oh how some of our index scans long for more spindles). The performance improvements made in the past few releases have been incredibly helpful- and very much noticeable each time. -- - David T. Wilson david.t.wilson@xxxxxxxxx