We're migrating to new hardware and to pgsql 8.3.1 from pgsql 8.2.4. We were scheduled to go live yesterday morning but elected not to late Friday after observing this issue:
Our new hardware includes an external Dell MD3000 RAID array of 15 15k SAS disks. We have a 2 disk RAID1 array for txnlog and a 12 disk RAID10 array for pgsql data.
Our host is a Dell PowerEdge 2950 with 2 Quad-Core Xeon 2.5GHz CPUs and 16GB of RAM running 64-bit CentOS on an internal RAID1 using standard 7200 RPM SATA drives and Dell's PERC 6i controller. The host connects to the MD3000 via 2 SAS HBA cards and Dell's multipath RDAC driver.
I ran pgbench against two database instances - one using the disks from the MD3000 and the other using local internal SATA storage. The results I got showed roughly twice the throughput on local storage vs the external direct-attached-storage array. My procedure to run the benchmark was:
- Create new DB on MD3000 mount using initdb. Edit postgresql.conf and change only the port # to 5462
- Create new DB on local mount using initdb. Edit postgresql.conf and change only the port # to 5452
Here are the results from pgbench:
[postgres@dbnya1 ~]$ pgbench -p 5462 -c 20 -t 100 pgbench-md3000
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 833.874657 (including connections establishing)
tps = 846.126412 (excluding connections establishing)
[postgres@dbnya1 ~]$ pgbench -p 5452 -c 20 -t 100 pgbenchloc
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 20
number of transactions per client: 100
number of transactions actually processed: 2000/2000
tps = 2047.808129 (including connections establishing)
tps = 2125.310428 (excluding connections establishing)
I subsequently ran bonnie++, an I/O benchmark, on the local storage and the MD3000 and found that the raw I/O throughput I'm getting on the MD3000 is roughly twice as much as on local storage. Here's the results from bonnie++:
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
dbnya1.local 32088M 48475 67 50154 12 24146 3 55312 70 54822 4 197.9 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
dbnya1.md3000 32088M 70672 97 137568 35 71718 17 74395 95 241676 21 1049 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
So, for some strange reason, pgsql is struggling with performance when being run off this external disk. Has anyone seen behavior similar to this before? Any suggestions on how to proceed? It seems to me that this isn't a hardware issue based on the bonnie++ benchmark as we're getting more raw throughput on all of the tests.
---Marc