Re: RAID performance

Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 07 Feb 2013 23:49:26 +1100

On 07/02/13 22:07, Dave Cundiff wrote:
> On Thu, Feb 7, 2013 at 5:19 AM, Adam Goryachev
> <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> On 07/02/13 20:07, Dave Cundiff wrote:
>>> On Thu, Feb 7, 2013 at 1:48 AM, Adam Goryachev
>>> <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> Why would you plug thousands of dollars of SSD into an onboard
>>> controller? It's probably running off a 1x PCIE shared with every
>>> other onboard device. An LSI 8x 8 port HBA will run you a few
>>> hundred(less than 1 SSD) and let you melt your northbridge. At least
>>> on my Supermicro X8DTL boards I had to add active cooling to it or it
>>> would overheat and crash at sustained IO. I can hit 2 - 2.5GB a second
>>> doing large sequential IO with Samsung 840 Pros on a RAID10.
>>
>> Because originally I was just using 4 x 2TB 7200 rpm disks in RAID10, I
>> upgraded to SSD to improve performance (which it did), but hadn't (yet)
>> upgraded the SATA controller because I didn't know if it would help.
>>
>> I'm seeing conflicting information here (buy SATA card or not)...
> 
> Its not going to help your remote access any. From your configuration
> it looks like you are limited to 4 gigabits. At least as long as your
> NICs are not in the slot shared with the disks. If they are you might
> get some contention.
> 
> http://download.intel.com/support/motherboards/server/sb/g13326004_s1200bt_tps_r2_0.pdf
> 
> See page 17 for a block diagram of your motherboard. You have a 4x DMI
> connection that PCI slot 3, your disks, and every other onboard device
> share. That should be about 1.2GB(10Gigabits) of bandwidth. Your SSDs
> alone could saturate that if you performed a local operation. Get your
> NIC's going at 4Gig and all of it a sudden you'll really want that
> SATA card in slot 4 or 5.

OK, I'll have to check that the 4 x 1G ethernet are in slots 4 and 5
now, not using the onboard ethernet, and not in slot 3...

If I could get close to 4Gbps (ie, saturate the ethernet) then I think
I'd be more than happy... I don't see my SSD's running at 400MB/s though
anyway....

>>>> 2) Move from a 5 disk RAID5 to a 8 disk RAID10, giving better data
>>>> protection (can lose up to four drives) and hopefully better performance
>>>> (main concern right now), and same capacity as current.
>>>
>>> I've had strange issues with anything other than RAID1 or 10 with SSD.
>>> Even with the high IO and IOP rates of SSDs the parity calcs and extra
>>> writes still seem to penalize you greatly.
>>
>> Maybe this is the single threaded nature of RAID5 (and RAID10) ?
> 
> I definitely see that. See below for a FIO run I just did on one of my RAID10s
> 
> md2 : active raid10 sdb3[1] sdf3[5] sde3[4] sdc3[2] sdd3[3] sda3[0]
>       742343232 blocks super 1.2 32K chunks 2 near-copies [6/6] [UUUUUU]
> 
> seq-read: (g=0): rw=read, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio,
> iodepth=32
> seq-write: (g=2): rw=write, bs=64K-64K/64K-64K/64K-64K,
> ioengine=libaio, iodepth=32
> 
> Run status group 0 (all jobs):
>    READ: io=4096.0MB, aggrb=2149.3MB/s, minb=2149.3MB/s,
> maxb=2149.3MB/s, mint=1906msec, maxt=1906msec
> 
> Run status group 2 (all jobs):
>   WRITE: io=4096.0MB, aggrb=1168.7MB/s, minb=1168.7MB/s,
> maxb=1168.7MB/s, mint=3505msec, maxt=3505msec
> 
> These drives are pretty fresh and my writes are a whole gig less than
> my read. Its not for lack of bandwidth either.

Can you please show your command line used, so I can try a similar test
and see a comparison?

>>> Also if your kernel does not have md TRIM support you risk taking a
>>> SEVERE performance hit on writes. Once you complete a full write pass
>>> on your NAND the SSD controller will require extra time to complete a
>>> write. if your IO is mostly small and random this can cause your NAND
>>> to become fragmented. If the fragmentation becomes bad enough you'll
>>> be lucky to get 1 spinning disk worth of write IO out of all 5
>>> combined.
>>
>> This was the reason I made the partition (for raid) smaller than the
>> disk, and left the rest un-partitioned. However, as you said, once I've
>> fully written enough data to fill the raw disk capacity, I still have a
>> problem. Is there some way to instruct the disk (overnight) to TRIM the
>> extra blank space, and do whatever it needs to tidy things up? Perhaps
>> this would help, at least first thing in the morning if it isn't enough
>> to get through the day. Potentially I could add a 6th SSD, reduce the
>> partition size across all of them, just so there is more blank space to
>> get through a full day worth of writes?
> 
> There was a script called mdtrim that would use hdparm to manually
> send the proper TRIM commands to the drives. I didn't bother looking
> for a link because it scares me to death and you probably shouldn't
> use it. If it gets the math wrong random data will disappear from your
> disks.

Doesn't sound good... would be nice to use smartctl or similar to ask
the drive "please tidy up now". The drive itself knows that the
unpartitioned space is available.

> As for changing partition sizes you really have to know what kinds of
> IO you're doing. If all you're doing is hammering these things with
> tiny IOs 24x7 its gonna end up with terrible write IO. At least my
> SSDs do. If you have a decent mix of small and large it may not
> fragment as badly. I ran random 4k against mine for 2 days before it
> got miserably slow. Reading will always be fine.

Well, if I can re-trim daily, and have enough clean space to work for 2
days, then I should never hit this problem.... Assuming it loses *that
much* performance....

Thanks,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html