Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10)

Merlin Moncure <mmoncure@xxxxxxxxx> · Tue, 21 Feb 2017 09:04:34 -0600

On Tue, Feb 21, 2017 at 7:49 AM, Pietro Pugni <pietro.pugni@xxxxxxxxx> wrote:
Hi there,I configured an IBM X3650 M4 for development and testing purposes. It’s composed by:
 - 2 x Intel Xeon E5-2690 @ 2.90Ghz (2 x 8 physical Cores + HT)
 - 96GB RAM DDR3 1333MHz (12 x 8GB)
 - 2 x 146GB SAS HDDs @ 15k rpm configured in RAID1 (mdadm)
 - 6 x 525GB SATA SSDs (over-provisioned at 25%, so 393GB available)

I’ve done a lot of testing focusing on 4k and 8k workloads and found that the IOPS of those SSDs are half the expected. On serverfault.com someone suggested me that probably the bottle neck is the embedded RAID controller, a IBM ServeRaid m5110e, which mounts a LSI 2008 controller.

I’m using the disks in JBOD mode with mdadm software RAID, which is blazing fast. The CPU is also very fast, so I don’t mind having a little overhead due to software RAID.

My typical workload is Postgres run as a DWH with 1 to 2 billions of rows, big indexes, partitions and so on, but also intensive statistical computations.

Here’s my post on serverfault.com ( http://serverfault.com/questions/833642/slow-ssd-performance-ibm-x3650-m4-7915 )
and here’s a graph of those six SSDs evaluated using fio as stand-alone disks (outside of the RAID):

All those IOPS should be doubled if all was working correctly. The curve trend is correct for increasing IO Depths.

Anyway, I would like to buy a HBA controller that leverages those 6 SSDs. Each SSD should deliver about 80k to 90k IOPS, so in RAID10 I should get ~240k IOPS (6 x 80k / 2) and in RAID0 ~480k IOPS (6 x 80k). I’ve seen that mdadm effectively scales performance, but the controller limits the overal IOPS at ~120k (exactly the half of the expected IOPS).

What HBA controller would you suggest me able to handle 500k IOPS? 

My server is able to handle 8 more SSDs, for a total of 14 SSDs and 1260k theoretical IOPS. If we imagine adding only 2 more disks, I will achieve 720k theoretical IOPS in RAID0. 

What HBA controller would you suggest me able to handle more than 700k IOPS? 

Have you got some advices about using mdadm RAID software on SATAIII SSDs and plain HBA?

Random points/suggestions:
*) mdadm is the way to go.   I think you'll get bandwidth constrained on most modern hba unless they are really crappy.  On reasonably modern hardware storage is rarely the bottleneck anymore (which is a great place to be).  Fancy raid controllers may actually hurt performance -- they are obsolete IMNSHO.

*) Small point, but you'll want to crank effective_io_concurrency (see: https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com). It only affects certain kinds of queries, but when it works it really works.    Those benchmarks were done on my crapbox dell workstation!

*) For very high transaction rates, you can get a lot of benefit from disabling synchronous_commit if you are willing to accommodate the risk.  I do not recommend disabling fsync unless you are prepared to regenerate the entire database at any time.

*) Don't assume indefinite linear scaling as you increase storage capacity -- the database itself can become the bottleneck, especially for writing.   To improve write performance, classic optimization strategies of trying to intelligently bundle writes around units of work still apply.  If you are expecting high rates of write activity your engineering focus needs to be here for sure (read scaling is comparatively pretty easy).

*) I would start doing your benchmarking with pgbench since that is going to most closely reflect measured production performance.  

> My typical workload is Postgres run as a DWH with 1 to 2 billions of rows, big indexes, partitions and so on, but also intensive statistical computations.

If this is the case your stack performance is going to be based on data structure design.  Make liberal use of:
*) natural keys 
*) constraint exclusion for partition selection
*) BRIN index is amazing (if you can work into it's limitations)
*) partial indexing
*) covering indexes.  Don't forget to vacuum your partitions before you make them live if you use them

If your data is going to get really big and/or query activity is expected to be high, keep an eye on your scale out strategy.   Going monolithic to bootstrap your app is the right choice IMO but start thinking about the longer term if you are expecting growth.  I'm starting to come out to the perspective that lift/shift scaleout using postgres fdw without an insane amount of app retooling could be a viable option by postgres 11/12 or so.  For my part I scaled out over asynchronous dblink which is a much more maintenance heavy strategy (but works fabulous although I which you could asynchronously connect).

merlin