On Tue, 2007-09-18 at 09:55 -0500, Scott Marlowe wrote: > On 9/17/07, Ow Mun Heng <Ow.Mun.Heng@xxxxxxx> wrote: > > > Just wondering how everyone is doing aggregration of production data. > > Where I work, we aggregate by the minute in the application, then dump > those stats entries into the database en masse from each machine. So, [snip] > The fail1/2/3 are types of failures, the totalresptime > time is all the successful requests added together, so that > totalresptime/success = average for that minute, and max is the > longest running request ending in that minute. So, it's basicaly a summation rather than using avg() all the way. That would mean using something like bigint or something to deal with the large numbers > > > (getting min/max/count isn't much of an issue. Stdev is the main issue I > > believe) > > > > One such instance I've read about is.. > > Isn't stddev() working for you? Stdev is based on a population of data. if I do slices and if I want to stdev across 2 months (8 weeks eg) then it would be wrong. > > What I do is aggregate the minute time slices by grouping by > date_trunc('xxx',timestamp) and then use that as a subselect to a > query that does the stddev() outside of that. works pretty well for > us, and makes it easy to identify trends. Trending analysis is very critical and the only reference I've found on how to get "stdev" is based on what I posted. > One of the things we started doing is to aggregate the last six weeks > data by the day / hour and then comparing the last 24 hours worth of > data to those six weeks worth to see trends for each hour of each day. > The queries are huge and ugly, but they work, and run in about 4 > minutes on a fairly good sized chunk of data. We have about 150k to > 300k entries a day put into this db. I'm not sure how many K entries in a day(yet to count it) but I'm getting into trouble w/ one-2-many relationships and PG is choosing to do nested loops etc. (lots of left joins on same tables) So, I've to resort to using SRF and function scans. a >2 hour Complex query (on my laptopn,PG, runs in 20min on high end mssql) is reduced to ~60secs. Thanks for the response. BTW, are user queries an issue? Does it interfere with the loading. (That's one of my concerns over here). I noticed that you do the calc at the app before sending it to the DB, which is a good thing and every minute too. (is it adviseable to do it in chunks of 1min? less data per minute vs per 30min/1 hour ) ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend