Re: SAN performance mystery

Tim Allen <tim@xxxxxxxxxxxxxxxx> · Mon, 19 Jun 2006 20:09:47 +1000

Scott Marlowe wrote:
On Thu, 2006-06-15 at 16:50, Tim Allen wrote:

We have a customer who are having performance problems. They have a 
large (36G+) postgres 8.1.3 database installed on an 8-way opteron with 
8G RAM, attached to an EMC SAN via fibre-channel (I don't have details 
of the EMC SAN model, or the type of fibre-channel card at the moment). 
They're running RedHat ES3 (which means a 2.4.something Linux kernel).

They are unhappy about their query performance. We've been doing various 
things to try to work out what we can do. One thing that has been 
apparent is that autovacuum has not been able to keep the database 
sufficiently tamed. A pg_dump/pg_restore cycle reduced the total 
database size from 81G to 36G. Performing the restore took about 23 hours.

Do you have the ability to do any simple IO performance testing, like
with bonnie++ (the old bonnie is not really capable of properly testing
modern equipment, but bonnie++ will give you some idea of the throughput
of the SAN)  Or even just timing a dd write to the SAN?

I've done some timed dd's. The timing results vary quite a bit, but it 
seems you can write to the SAN at about 20MB/s and read from it at about 
 12MB/s. Not an entirely scientific test, as I wasn't able to stop 
other activity on the machine, though I don't think much else was 
happening. Certainly not impressive figures, compared with our machine 
with the SATA disk (referred to below), which can get 161MB/s copying 
files on the same disk, and 48MB/s and 138Mb/s copying files from the 
sata disk respectively to and from a RAID5 array.

The customer is a large organisation, with a large IT department who 
guard their turf carefully, so there is no way I could get away with 
installing any heavier duty testing tools like bonnie++ on their machine.

We tried restoring the pg_dump output to one of our machines, a 
dual-core pentium D with a single SATA disk, no raid, I forget how much 
RAM but definitely much less than 8G. The restore took five hours. So it 
would seem that our machine, which on paper should be far less 
impressive than the customer's box, does more than four times the I/O 
performance.

To simplify greatly - single local SATA disk beats EMC SAN by factor of 
four.

Is that expected performance, anyone? It doesn't sound right to me. Does 
anyone have any clues about what might be going on? Buggy kernel 
drivers? Buggy kernel, come to think of it? Does a SAN just not provide 
adequate performance for a large database?

Yes, this is not uncommon.  It is very likely that your SATA disk is
lying about fsync.

I guess a sustained write will flood the disk's cache and negate the 
effect of the write-completion dishonesty. But I have no idea how large 
a copy would have to be to do that - can anyone suggest a figure? 
Certainly, the read performance of the SATA disk still beats the SAN, 
and there is no way to lie about read performance.

What kind of backup are you using?  insert statements or copy
statements?  If insert statements, then the difference is quite
believable.  If copy statements, less so.

A binary pg_dump, which amounts to copy statements, if I'm not mistaken.

Next time, on their big server, see if you can try a restore with fsync
turned off and see if that makes the restore faster.  Note you should
turn fsync back on after the restore, as running without it is quite
dangerous should you suffer a power outage.

How are you mounting to the EMC SAN?  NFS, iSCSI? Other?

iSCSI, I believe. Some variant of SCSI, anyway, of that I'm certain.

The conclusion I'm drawing here is that this SAN does not perform at all 
well, and is not a good database platform. It's sounding from replies 
from other people that this might be a general property of SAN's, or at 
least the ones that are not stratospherically priced.

Tim

--
-----------------------------------------------
Tim Allen          tim@xxxxxxxxxxxxxxxx
Proximity Pty Ltd  http://www.proximity.com.au/