On Tue, Dec 22, 2015 at 12:15 AM, Doug Dumitru <doug@xxxxxxxxxx> wrote: > My apologies for diving in so late. > > I routinely run 24 drive raid-5 sets with SSDs. Chunk is set at 32K > and the applications only writes "perfect" 736K "stripes". The SSDs > are Samsung 850 pros on dedicated LSI 3008 SAS ports and are at "new" > preconditioning (ie, they are at full speed) or just over 500 MB/sec. > CPU is a single E5-1650 v3. > > With stock RAID-5 code, I get about 1.8 GB/sec, q=4. > > Now this application is writing from kernel space > (generic_make_request w/ q waiting for completion callback). There > are a lot of RMW operations happening here. I think the raid-5 > background thread is waking up asynchronously when only a part of the > write has been buffered into stripe cache pages. The bio going into > the raid layer is a single bio, so nothing is being carved up on the > request end. The raid-5 helper thread also saturates a cpu core > (which is about as fast as you can get with an E5-1650). > > If I patch raid5.ko with special case code to avoid the stripe cache > and just compute parity and go, the write throughput goes up above > 11GB/sec. > > This is obviously an impossible IO pattern for most applications, but > does confirm that the upper limit of (n-1)*bw is "possible", but not > with the current stripe cache logic in the raid layer. > > Doug Dumitru > WildFire Storage > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > If I patch raid5.ko with special case code to avoid the stripe cache > and just compute parity and go, the write throughput goes up above > 11GB/sec. Hi Doug. This is really quite astounding and encouraging! Would you be willing to share your patch? I am eager to give it a try for RAID 5 and 6. > Now this application is writing from kernel space > (generic_make_request w/ q waiting for completion callback). There > are a lot of RMW operations happening here. I think the raid-5 > background thread is waking up asynchronously when only a part of the > write has been buffered into stripe cache pages. I am also anxious to hear from anyone who maintains the stripe cache code. I am seeing similar behavior when I monitor writes of perfectly stripe-aligned blocks. The # of RMWs are smallish and seem to vary, but still I do not expect to see any of them! -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html