Scott Marlowe wrote:
On Thu, 2006-06-15 at 16:50, Tim Allen wrote:
We have a customer who are having performance problems. They have a
large (36G+) postgres 8.1.3 database installed on an 8-way opteron with
8G RAM, attached to an EMC SAN via fibre-channel (I don't have details
of the EMC SAN model, or the type of fibre-channel card at the moment).
They're running RedHat ES3 (which means a 2.4.something Linux kernel).
They are unhappy about their query performance. We've been doing various
things to try to work out what we can do. One thing that has been
apparent is that autovacuum has not been able to keep the database
sufficiently tamed. A pg_dump/pg_restore cycle reduced the total
database size from 81G to 36G. Performing the restore took about 23 hours.
Do you have the ability to do any simple IO performance testing, like
with bonnie++ (the old bonnie is not really capable of properly testing
modern equipment, but bonnie++ will give you some idea of the throughput
of the SAN) Or even just timing a dd write to the SAN?
I've done some timed dd's. The timing results vary quite a bit, but it
seems you can write to the SAN at about 20MB/s and read from it at about
12MB/s. Not an entirely scientific test, as I wasn't able to stop
other activity on the machine, though I don't think much else was
happening. Certainly not impressive figures, compared with our machine
with the SATA disk (referred to below), which can get 161MB/s copying
files on the same disk, and 48MB/s and 138Mb/s copying files from the
sata disk respectively to and from a RAID5 array.
The customer is a large organisation, with a large IT department who
guard their turf carefully, so there is no way I could get away with
installing any heavier duty testing tools like bonnie++ on their machine.
We tried restoring the pg_dump output to one of our machines, a
dual-core pentium D with a single SATA disk, no raid, I forget how much
RAM but definitely much less than 8G. The restore took five hours. So it
would seem that our machine, which on paper should be far less
impressive than the customer's box, does more than four times the I/O
performance.
To simplify greatly - single local SATA disk beats EMC SAN by factor of
four.
Is that expected performance, anyone? It doesn't sound right to me. Does
anyone have any clues about what might be going on? Buggy kernel
drivers? Buggy kernel, come to think of it? Does a SAN just not provide
adequate performance for a large database?
Yes, this is not uncommon. It is very likely that your SATA disk is
lying about fsync.
I guess a sustained write will flood the disk's cache and negate the
effect of the write-completion dishonesty. But I have no idea how large
a copy would have to be to do that - can anyone suggest a figure?
Certainly, the read performance of the SATA disk still beats the SAN,
and there is no way to lie about read performance.
What kind of backup are you using? insert statements or copy
statements? If insert statements, then the difference is quite
believable. If copy statements, less so.
A binary pg_dump, which amounts to copy statements, if I'm not mistaken.
Next time, on their big server, see if you can try a restore with fsync
turned off and see if that makes the restore faster. Note you should
turn fsync back on after the restore, as running without it is quite
dangerous should you suffer a power outage.
How are you mounting to the EMC SAN? NFS, iSCSI? Other?
iSCSI, I believe. Some variant of SCSI, anyway, of that I'm certain.
The conclusion I'm drawing here is that this SAN does not perform at all
well, and is not a good database platform. It's sounding from replies
from other people that this might be a general property of SAN's, or at
least the ones that are not stratospherically priced.
Tim
--
-----------------------------------------------
Tim Allen tim@xxxxxxxxxxxxxxxx
Proximity Pty Ltd http://www.proximity.com.au/