On 9/17/07, Ow Mun Heng <Ow.Mun.Heng@xxxxxxx> wrote: > Just wondering how everyone is doing aggregration of production data. Where I work, we aggregate by the minute in the application, then dump those stats entries into the database en masse from each machine. So, we wind up with rows something like this: (simplified) id | parentid | host | timestamp | request | total | success | fail1 | fail2 | fail3 | totalresptime | maxresptime 1 | NULL | jboss01 | 2007-09-17 12:02:03 | authcc | 120 | 112 | 4 | 2 | 2 | 480000 | 12039 2 | 1 | jboss01 | 2007-09-17 12:02:03 | ccconnect | 120 | 118 | 0 | 0 | 2 | 423900 | 10394 where the id comes from a sequence, and parent ID ties our children stats to their parents. I.e. in this example authcc called ccconnect, and so on. The fail1/2/3 are types of failures, the totalresptime time is all the successful requests added together, so that totalresptime/success = average for that minute, and max is the longest running request ending in that minute. Then we can aggregate those minutes together, monitor individual machine performance, etc. Ours is actually more complex than this, but you get the idea. We have a cron job that checks the statistics every x minutes for high failure rates and have it generate an alert email if any of our requests go over a preset threshold. This catches problems long before anything shows up interesting in the logs most of the time. > (getting min/max/count isn't much of an issue. Stdev is the main issue I > believe) > > One such instance I've read about is.. Isn't stddev() working for you? What I do is aggregate the minute time slices by grouping by date_trunc('xxx',timestamp) and then use that as a subselect to a query that does the stddev() outside of that. works pretty well for us, and makes it easy to identify trends. One of the things we started doing is to aggregate the last six weeks data by the day / hour and then comparing the last 24 hours worth of data to those six weeks worth to see trends for each hour of each day. The queries are huge and ugly, but they work, and run in about 4 minutes on a fairly good sized chunk of data. We have about 150k to 300k entries a day put into this db. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend