High update activity, PostgreSQL vs BigDBMS

Guy Rouillier <guyr-ml1@xxxxxxxxxxxxx> · Thu, 28 Dec 2006 00:46:49 -0500

I don't want to violate any license agreement by discussing performance, 
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as 
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL 
for at least some of our Java applications.  As a proof of concept, I 
started with a high-volume (but conceptually simple) network data 
collection application.  This application collects files of 5-minute 
usage statistics from our network devices, and stores a raw form of 
these stats into one table and a normalized form into a second table. 
We are currently storing about 12 million rows a day in the normalized 
table, and each month we start new tables.  For the normalized data, the 
app inserts rows initialized to zero for the entire current day first 
thing in the morning, then throughout the day as stats are received, 
executes updates against existing rows.  So the app has very high update 
activity.

In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and 
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays 
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those 
familiar with these devices.)  The arrays are set up with RAID5.  So I'm 
working with a consistent hardware platform for this comparison.  I'm 
only processing a small subset of files (144.)

BigDBMS processed this set of data in 20000 seconds, with all foreign 
keys in place.  With all foreign keys in place, PG took 54000 seconds to 
complete the same job.  I've tried various approaches to autovacuum 
(none, 30-seconds) and it doesn't seem to make much difference.  What 
does seem to make a difference is eliminating all the foreign keys; in 
that configuration, PG takes about 30000 seconds.  Better, but BigDBMS 
still has it beat significantly.

I've got PG configured so that that the system database is on disk array 
2, as are the transaction log files.  The default table space for the 
test database is disk array 3.  I've got all the reference tables (the 
tables to which the foreign keys in the stats tables refer) on this 
array.  I also store the stats tables on this array.  Finally, I put the 
indexes for the stats tables on disk array 4.  I don't use disk array 1 
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.

--
Guy Rouillier