Dne 29.6.2011 01:26, Craig McIlwee napsal(a): >> Dne 28.6.2011 23:28, Craig McIlwee napsal(a): >> Are you sure those two queries are exactly the same? Because the daily >> case output says the width is 50B, while the half-month case says it's >> 75B. This might be why the sort/aggregate steps are switched, and that >> increases the amount of data so that it has to be sorted on disk (which >> is why the half-month is so much slower). >> >> Haven't you added some columns to the half-month query? > > The daily tables were created using CREATE TABLE AS from the half month > tables, structure is the same with the exception of fill factor. > Queries are identical except for the name of the master table that they > select from. Hm, I'm not sure where this width value comes from but I don't think it's related to fillfactor. >> Not much experience with PostgreSQL on Windows, but this looks good to >> me. Not sure if RAID5 is a good choice, especially because of write >> performance - this is probably one of the reasons why the disk sort is >> so slow (in the half-month case). > > Yes, the data import is painfully slow but I hope to make up for that > with the read performance later. Generally you're right that RAID10 is going to be slower than RAID5 when reading (and faster when writing) the data, but how big the gap is really depends on the controller. It's not that big I guess - see for example this: http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html The first test shows that RAID10 is about 10% slower on reads but about 60% faster on writes. BTW have you tuned the GUC settings for write (increasing checkpoint segments may give much better write performance). > The 0 volume is the only thing that will always be present, but those > records do account for 10 to 15% of the data. I'll give this a shot, > I'm really interested in seeing what impact this had. For some reason I > was under the impression that partial indexes were used for text > searches, so I completely overlooked this. Or you might actually do two partitions for each day - one for volume=0 and the other one for volume!=0. Not sure if that is worth the effort. One more thing to try in this case - it's not that important how many rows suffice the condition, much more important is how many blocks need to be read from the disk. If those 10% rows are distributed evenly through the table (i.e. there's at least one in each 8kB block), the I/O still needs to be done. And it's very likely the case, as you've clustered the tables according to the timestamp. Try to cluster the tables according to 'volume' and check the difference. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance