SAN performance mystery

Tim Allen <tim@xxxxxxxxxxxxxxxx> · Fri, 16 Jun 2006 07:50:19 +1000

We have a customer who are having performance problems. They have a 
large (36G+) postgres 8.1.3 database installed on an 8-way opteron with 
8G RAM, attached to an EMC SAN via fibre-channel (I don't have details 
of the EMC SAN model, or the type of fibre-channel card at the moment). 
They're running RedHat ES3 (which means a 2.4.something Linux kernel).

They are unhappy about their query performance. We've been doing various 
things to try to work out what we can do. One thing that has been 
apparent is that autovacuum has not been able to keep the database 
sufficiently tamed. A pg_dump/pg_restore cycle reduced the total 
database size from 81G to 36G. Performing the restore took about 23 hours.

We tried restoring the pg_dump output to one of our machines, a 
dual-core pentium D with a single SATA disk, no raid, I forget how much 
RAM but definitely much less than 8G. The restore took five hours. So it 
would seem that our machine, which on paper should be far less 
impressive than the customer's box, does more than four times the I/O 
performance.

To simplify greatly - single local SATA disk beats EMC SAN by factor of 
four.

Is that expected performance, anyone? It doesn't sound right to me. Does 
anyone have any clues about what might be going on? Buggy kernel 
drivers? Buggy kernel, come to think of it? Does a SAN just not provide 
adequate performance for a large database?

I'd be grateful for any clues anyone can offer,

Tim

begin:vcard
fn:Tim Allen
n:Allen;Tim
email;internet:tim@xxxxxxxxxxxxxxxx
x-mozilla-html:FALSE
version:2.1
end:vcard