Concerning the aggregation, no – I am
not running aggregate functions on the database itself.
Let’s take a small example.
Let’s say I am aggregating avg ping time on an hourly basis.
I have, inside the table declarations, structures
with an array[24].
As data comes in, I retrieve the test
record for the particular test, based on the test time I establish which array
offset needs to be taken care of, and then perform the relevant computatis.
For example, I have a testcounter array
and a testresult array. When a new event comes in, average is going to be
(testresult[n] * testcounter[n] + newvalue) / testcounter[n] + 1, followed by a
testcounter[n] += 1. So I am not loading the database doing aggregate
functions.
On our UI side of things, now that
everything is working properly, I am also optimizing code. The current
test code was looping through an array elemnt and aggregating through the
backend. Of course, this is inefficient since, for example, let’s
say I am showing an hourly graph, now I have 24 selects going on. Right
now we are recoding to have one select which will return the aggregate of all
of the offsets at once. It would be great if we could aggregate an array
in one hot, but in the meantime we’ll do it this way.
From: Scott Marlowe
[mailto:smarlowe@xxxxxxxxxxxxxxxxx]
Sent: Friday, April 28, 2006 9:24
AM
To: Benjamin Krajmalnik
Cc: Ben K.; pgsql-admin@xxxxxxxxxxxxxx
Subject: Re: [ADMIN] New system
recommendations
I'm
putting both the private email and the thread back on the list, as
there's interesting data in here people could use. I don't think I'm
betraying any trust here, but if I am, please, let me know...
On Thu, 2006-04-27 at 17:37, Benjamin Krajmalnik wrote:
> Thanks for the feedback.
> I wonder if the Dell backplane will take a stabdard RAID controller
> instead of using the ROMB.
> I may investigate just getting 146GB drives for the DL360 and using that
> instead, and maybe setting up the dell as a backup server, creating
> backups and moving the files over.
If you've got some spare smaller drives laying about, you can always set
up both and benchmark them against each other, plus that gives you a
ready to go backup machine.
> Quick question - I do not have a transaction frame explicitly declared
> in m stored procedure (which is used to handle all of the inserts).
> In plpgsql, how would I wrap the code inside a transaction frame?
All stored procs run as a single transaction. I.e. the simple act of
making it a user defined function has made it a single transaction, with
all the benefits that entails.
> This is such a dilemma - we are launching a new service, but until we
> sign up clients on it I don't have the budget to go crazy on hardware.
> I must make do with what I have. What I was thinking was mybe
setting
> up the Dell to be the web server and the backup database server, and
> having the production database run on the DL 360, provided I can get
> better database performance. If I can get significant better
> performance on FreeBSD I will go to it, otherwise I will stay with
> Windows.
I'd test the two. I'm guessing that at least for the initial ramp up,
windows will be ok. The real performance issue for windows is opening
connections is much slower. If the majority of your machine's time is
spent on the query, the overhead of opening connections will be lost in
the noise.
> The stored procedure is a plpgsql function which gets passed parameters
> from the monitoring agent.
> It then dynamically creates a device record for the monitored device if
> one does not yet exist.
> Once that is done it creates a test record for the particular test if
> one does not exist.
Up to now, fine. Performance on any machine should be ok.
> Once that is done, it aggregates dynamically the data into 4 tables - a
> daily snapshot, a weekly snapshot, and a monthly snapshot, and a
> dashboard snapshot.
Are you running aggregate functions against the whole table to do this?
If so, this isn't going to scale. If you're just taking the data
entered in this stored proc and adding it to a table that contains that
data, then maybe it's ok. But if it's updating based on a huge amount
of data, then that's going to be slow.
> Once all of that has been accomplished, it creates a raw log entry
> (which as of next week will go to a given partition). This data goes
> into the partitioned table to facilitate purging of retention periods
> without hammering at the database and having to rebalance indices (I
> just truncate the partition once it is no longer needed).
Sounds reasonable.