Tim Allen wrote:
We have a customer who are having performance problems. They have a
large (36G+) postgres 8.1.3 database installed on an 8-way opteron
with 8G RAM, attached to an EMC SAN via fibre-channel (I don't have
details of the EMC SAN model, or the type of fibre-channel card at the
moment). They're running RedHat ES3 (which means a 2.4.something Linux
kernel).
They are unhappy about their query performance. We've been doing
various things to try to work out what we can do. One thing that has
been apparent is that autovacuum has not been able to keep the
database sufficiently tamed. A pg_dump/pg_restore cycle reduced the
total database size from 81G to 36G. Performing the restore took about
23 hours.
We tried restoring the pg_dump output to one of our machines, a
dual-core pentium D with a single SATA disk, no raid, I forget how
much RAM but definitely much less than 8G. The restore took five
hours. So it would seem that our machine, which on paper should be far
less impressive than the customer's box, does more than four times the
I/O performance.
To simplify greatly - single local SATA disk beats EMC SAN by factor
of four.
Is that expected performance, anyone? It doesn't sound right to me.
Does anyone have any clues about what might be going on? Buggy kernel
drivers? Buggy kernel, come to think of it? Does a SAN just not
provide adequate performance for a large database?
I'd be grateful for any clues anyone can offer,
I'm actually in a not dissimiliar position here- I was seeing the
performance of Postgres going to an EMC Raid over iSCSI running at about
1/2 the speed of a lesser machine hitting a local SATA drive. That was,
until I noticed that the SATA drive Postgres installation had fsync
turned off, and the EMC version had fsync turned on. Turning fsync on
on the SATA drive dropped it's performance to being about 1/4th that of EMC.
Moral of the story: make sure you're comparing apples to apples.
Brian