On Fri, 25 Jan 2008, David Brain wrote:
We currently have one large DB (~1.2TB on disk), that essentially consists of
1 table with somewhere in the order of 500 million rows , this database has
daily inserts as well as being used for some semi-data mining type
operations, so there are a fairly large number of indices on the table. The
hardware storing this DB (a software RAID6) array seems to be very IO bound
for writes and this is restricting our insert performance to ~50TPS.
As you have such a complex insert procedure, I'm not so surprised that you
are getting this kind of performance. Your average discs will do something
like 200 seeks per second, so if you are having to perform four seeks per
transaction, that would explain it. Remember, on software RAID 6 (without
a battery backed up cache) all the discs will probably need to participate
in each transaction.
Your suggestion of splitting the data seems hinged around having a smaller
table resulting in quicker SELECTs - it might be worth doing an experiment
to see whether this is actually the case. My guess is that you may not
actually get much of an improvement.
So, my suggestion would be to:
1. Make sure the server has plenty of RAM, so hopefully a lot of the
SELECT traffic hits the cache.
2. Upgrade your disc system to hardware RAID, with a battery-backed-up
cache. This will enable the writes to occur immediately without having
to wait for the discs each time. RAID 6 sounds fine, as long as there
is a battery-backed-up cache in there somewhere. Without that, it can
be a little crippled.
We don't actually have that much information on how much time Postgres is
spending on each of the different activities, but the above is probably a
good place to start.
Hope that helps,
Matthew
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend