On 9/18/07, Ow Mun Heng <Ow.Mun.Heng@xxxxxxx> wrote: > On Tue, 2007-09-18 at 09:55 -0500, Scott Marlowe wrote: > > On 9/17/07, Ow Mun Heng <Ow.Mun.Heng@xxxxxxx> wrote: > > > > > Just wondering how everyone is doing aggregration of production data. > > > > Where I work, we aggregate by the minute in the application, then dump > > those stats entries into the database en masse from each machine. So, > [snip] > > The fail1/2/3 are types of failures, the totalresptime > > time is all the successful requests added together, so that > > totalresptime/success = average for that minute, and max is the > > longest running request ending in that minute. > > So, it's basicaly a summation rather than using avg() all the way. > That would mean using something like bigint or something to deal with > the large numbers Yeah. We use numeric(18) (bonus points if anyone knows what other database I'm pumping data out of to get a numeric(18) as a default.) The advantage is that I don't have to re-weight my measurements. When the system was first designed, it stored avg response and number of responses, which I had to then multiply to get the original total time and process to get a weighted average. This method removes the averaging step in the client software AND the multiplying step to get the raw numbers back on my end, win-win. > > > (getting min/max/count isn't much of an issue. Stdev is the main issue I > > > believe) > > > > > > One such instance I've read about is.. > > > > Isn't stddev() working for you? > > Stdev is based on a population of data. if I do slices and if I want to > stdev across 2 months (8 weeks eg) then it would be wrong. But if you ask for the data set grouped by date_trunc('minute',timestmp) in an inner select, then in the outer select you can do a stddev on that set and get it. Or is that not accurate? > > What I do is aggregate the minute time slices by grouping by > > date_trunc('xxx',timestamp) and then use that as a subselect to a > > query that does the stddev() outside of that. works pretty well for > > us, and makes it easy to identify trends. > > Trending analysis is very critical and the only reference I've found on > how to get "stdev" is based on what I posted. Well, what we're doing seems to be getting us what look like proper numbers. > > One of the things we started doing is to aggregate the last six weeks > > data by the day / hour and then comparing the last 24 hours worth of > > data to those six weeks worth to see trends for each hour of each day. > > The queries are huge and ugly, but they work, and run in about 4 > > minutes on a fairly good sized chunk of data. We have about 150k to > > 300k entries a day put into this db. > > I'm not sure how many K entries in a day(yet to count it) but I'm > getting into trouble w/ one-2-many relationships and PG is choosing to > do nested loops etc. (lots of left joins on same tables) Yeah, for this kind of thing, you want to pre-process your tables into one big flat table if you can. We do have one to many relationships within the same table (parent / child system requests are one to many) and when we join the table to itself several times it can get hairy. We had some nested loop problems until I upped the stats target on a few of the columns and scheduled regular analyzes for the main table. Now the system picks the right plan 99.9% or more of the time. And when it doesn't, it's usually something fresh and new that's the issue. > So, I've to resort to using SRF and function scans. a >2 hour Complex > query (on my laptopn,PG, runs in 20min on high end mssql) is reduced to > ~60secs. That's an excellent improvement! > Thanks for the response. > > BTW, are user queries an issue? Does it interfere with the loading. Not usually. I've got them trained pretty well, and the system has lots more I/O bandwidth than CPU horsepower, so generally a long running query just waits on I/O. RAID-10 has made this system much faster than the old single drive workstation (mine) it was running on. The data loading on this system runs once a minute and it pumps the data out of the production db, and puts it into the reporting db. Average run time to insert those rows is < 1 second. > (That's one of my concerns over here). I noticed that you do the calc > at the app before sending it to the DB, which is a good thing and every > minute too. (is it adviseable to do it in chunks of 1min? less data per > minute vs per 30min/1 hour ) exactly. Each minute is easy to keep track of, and if the app goes crazy for a bit and loses the data / gets restarted, you only lose a minutes or so worth of data. Also, we use this system for real time alerting and performance monitoring. If some request or other starts taking too long or the failure rate shoots up, it generates alerts. If it was collected every 30 minutes or an hour that would be far too late. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend