Re: best base / worst case RAID 5,6 write speeds

Dallas Clement <dallas.a.clement@xxxxxxxxx> · Thu, 10 Dec 2015 15:14:13 -0600

On Thu, Dec 10, 2015 at 2:29 PM, Phil Turmel <philip@xxxxxxxxxx> wrote:
> On 12/10/2015 03:09 PM, Dallas Clement wrote:
>> On Thu, Dec 10, 2015 at 2:06 PM, Phil Turmel <philip@xxxxxxxxxx> wrote:
>
>>> Where'd you get the worst case formulas?
>>
>> Google search I'm afraid.  I think the assumption for RAID 5,6 worst
>> case is having to read and write the parity + data every cycle.
>
> Well, it'd be a lot worse than half, then.  To use the shortcut in raid5
> to write one block, you have to read it first, read the parity, compute
> the change in parity, then write the block with the new parity.  That's
> two reads and two writes for a single upper level write.  For raid6, add
> read and write of the Q syndrome, assuming you have a kernel new enough
> to do the raid6 shortcut at all.  Three reads and three writes for a
> single upper level write.  In both cases, add rotational latency to
> reposition for writing over sectors just read.
>
> Those RMW operations generally happen to small random writes, which
> makes the assertion for sequential writes odd.  Unless you delay writes
> or misalign or inhibit merging, RMW won't trigger except possibly at the
> beginning or end of a stream.
>
> That's why I questioned O_SYNC when you were using a filesystem: it
> prevents merging, and forces seeking to do small metadata writes.
> Basically turning a sequential workload into a random one.
>
> Phil

> Those RMW operations generally happen to small random writes, which
> makes the assertion for sequential writes odd.

Exactly.  I'm not expecting RMWs to be happening for large sequential
writes.  But yet my RAID 5, 6 sequential write performance is still
very poor.  As mentioned earlier, I'm getting around 95 MB/s on the
inner side of these disks.  With 12 of them, my RAID 6 write speed
should be (12 - 2) * 95 = 950 MB/s.  I'm getting about 300 MB/s less
than that for this scenario.  I have the disks split up among three
different controllers.  There should be plenty of bandwidth.  Several
days ago I ran fio on each of the 12 disks concurrently.  I was able
to see the disks at or near 100% utilization and wMB/s around 160-170
MB/s.  That's why I started focusing on RAID as being the potential
bottleneck.

> That's why I questioned O_SYNC when you were using a filesystem: it
> prevents merging, and forces seeking to do small metadata writes.
> Basically turning a sequential workload into a random one.

Yes, that certainly makes sense.  Not using O_SYNC anymore.  Just O_DIRECT.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html