Hi Tom, Thanks for your quick response. 2009/8/26 Tom Lane <tgl@xxxxxxxxxxxxx> > <hashinclude@xxxxxxxxx> writes: > > In my timing tests, the performance of PG is quite a lot worse than the > > equivalent BerkeleyDB implementation. > > Are you actually comparing apples to apples? I don't recall that BDB > has any built-in aggregation functionality. It looks to me like you've > moved some work out of the client into the database. I'm measuring end-to-end time, which includes the in-code aggregation with BDB (post DB fetch) and the in-query aggregation in PG. > > 1. Is there anything I can do to speed up performance for the queries? > > Do the data columns have to be bigint, or would int be enough to hold > the expected range? SUM(bigint) is a *lot* slower than SUM(int), > because the former has to use "numeric" arithmetic whereas the latter > can sum in bigint. If you want to keep the data on-disk as bigint, > but you know the particular values being summed here are not that > big, you could cast in the query (SUM(data_1::int) etc). For the 300-sec tables I probably can drop it to an integer, but for 3600 and 86400 tables (1 hr, 1 day) will probably need to be BIGINTs. However, given that I'm on a 64-bit platform (sorry if I didn't mention it earlier), does it make that much of a difference? How does a float ("REAL") compare in terms of SUM()s ? > I'm also wondering if you've done something to force indexscans to be > used. If I'm interpreting things correctly, some of these scans are > traversing all/most of a partition and would be better off as seqscans. One thing I noticed is that if I specify what devices I want the data for (specifically, all of them, listed out as DEVICE IN (1,2,3,4,5...) in the WHERE clause, PG uses a Bitmap heap scan, while if I don't specify the list (which still gives me data for all the devices), PG uses a sequential scan. (I might have missed the DEVICE IN (...) in my earlier query). However, more often than not, the query _will_ be of the form DEVICE IN (...). If I actually execute the queries (on the psql command line), their runtimes are about the same (15s vs 16s) > > shared_buffers = 128MB > > This is really quite lame for the size of machine and database you've > got. Consider knocking it up to 1GB or so. OK, I've bumped it up to 1 GB. However, that doesn't seem to make a huge difference (unless I need to do the same on libpqxx's connection object too). Cheers, Hrishi -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance