Re: SAN performance mystery

Stefan Kaltenbrunner <stefan@xxxxxxxxxxxxxxxx> · Fri, 16 Jun 2006 09:48:00 +0200

Tim Allen wrote:
> We have a customer who are having performance problems. They have a
> large (36G+) postgres 8.1.3 database installed on an 8-way opteron with
> 8G RAM, attached to an EMC SAN via fibre-channel (I don't have details
> of the EMC SAN model, or the type of fibre-channel card at the moment).
> They're running RedHat ES3 (which means a 2.4.something Linux kernel).
> 
> They are unhappy about their query performance. We've been doing various
> things to try to work out what we can do. One thing that has been
> apparent is that autovacuum has not been able to keep the database
> sufficiently tamed. A pg_dump/pg_restore cycle reduced the total
> database size from 81G to 36G. Performing the restore took about 23 hours.

Hi Tim!

to give you some comparision - we have a similiar sized database here
(~38GB after a fresh restore and ~76GB after some months into
production). the server is a 4 core Opteron @2,4Ghz with 16GB RAM,
connected via 2 QLogic 2Gbit HBA's to the SAN (IBM DS4300 Turbo).

It took us quite a while to get this combination up to speed but a full
dump&restore cycle (via a pg_dump | psql pipe over the net) now takes
only about an hour.
23 hours or even 5 hours sounds really excessive - I'm wondering about
some basic issues with the SAN.
If you are using any kind of multipathing (most likely the one in the
QLA-drivers) I would at first assume that you are playing ping-pong
between the controllers (ie the FC-cards do send IO to more than one
SAN-head causing those to failover constantly completely destroying
performance).
ES3 is rather old too and I don't think that even their hacked up kernel
is very good at driving a large Opteron SMP box (2.6 should be MUCH
better in that regard).

Other than that - how well is your postgresql instance tuned to your
hardware ?

Stefan