Re: best base / worst case RAID 5,6 write speeds

Dallas Clement <dallas.a.clement@xxxxxxxxx> · Tue, 22 Dec 2015 10:48:38 -0600

On Tue, Dec 22, 2015 at 12:15 AM, Doug Dumitru <doug@xxxxxxxxxx> wrote:
> My apologies for diving in so late.
>
> I routinely run 24 drive raid-5 sets with SSDs.  Chunk is set at 32K
> and the applications only writes "perfect" 736K "stripes".  The SSDs
> are Samsung 850 pros on dedicated LSI 3008 SAS ports and are at "new"
> preconditioning (ie, they are at full speed) or just over 500 MB/sec.
> CPU is a single E5-1650 v3.
>
> With stock RAID-5 code, I get about 1.8 GB/sec, q=4.
>
> Now this application is writing from kernel space
> (generic_make_request w/ q waiting for completion callback).  There
> are a lot of RMW operations happening here.  I think the raid-5
> background thread is waking up asynchronously when only a part of the
> write has been buffered into stripe cache pages.  The bio going into
> the raid layer is a single bio, so nothing is being carved up on the
> request end.  The raid-5 helper thread also saturates a cpu core
> (which is about as fast as you can get with an E5-1650).
>
> If I patch raid5.ko with special case code to avoid the stripe cache
> and just compute parity and go, the write throughput goes up above
> 11GB/sec.
>
> This is obviously an impossible IO pattern for most applications, but
> does confirm that the upper limit of (n-1)*bw is "possible", but not
> with the current stripe cache logic in the raid layer.
>
> Doug Dumitru
> WildFire Storage
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

> If I patch raid5.ko with special case code to avoid the stripe cache
> and just compute parity and go, the write throughput goes up above
> 11GB/sec.

Hi Doug.  This is really quite astounding and encouraging!  Would you
be willing to share your patch?  I am eager to give it a try for RAID
5 and 6.

> Now this application is writing from kernel space
> (generic_make_request w/ q waiting for completion callback).  There
> are a lot of RMW operations happening here.  I think the raid-5
> background thread is waking up asynchronously when only a part of the
> write has been buffered into stripe cache pages.

I am also anxious to hear from anyone who maintains the stripe cache
code.  I am seeing similar behavior when I monitor writes of perfectly
stripe-aligned blocks.  The # of RMWs are smallish and seem to vary,
but still I do not expect to see any of them!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html