Re: Sequential writing to degraded RAID6 causing a lot of reading

NeilBrown <neilb@xxxxxxx> · Mon, 28 May 2012 11:31:45 +1000

On Thu, 24 May 2012 14:37:28 +0200 Patrik Horník <patrik@xxxxxx> wrote:

> On Thu, May 24, 2012 at 6:48 AM, NeilBrown <neilb@xxxxxxx> wrote:

> > Firstly, degraded RAID6 with a left-symmetric layout is quite different from
> > an optimal RAID5 because there are Q blocks sprinkled around and some D
> > blocks missing.  So there will always be more work to do.
> >
> > Degraded left-symmetric-6 is quite similar to optimal RAID5 as the same data
> > is stored in the same place - so reading should be exactly the same.
> > However writing is generally different and the code doesn't make any attempt
> > to notice and optimise cases that happen to be similar to RAID5.
> 
> Actually I have left-symmetric-6 without one of the "regular" drives
> not the one with only Qs on it, so it should be similar to degraded
> RAID6 with a left-symmetric in this regard.

Yes, it should - I had assumed wrongly ;-)

> 
> > A particular issue is that while RAID5 does read-modify-write when updating a
> > single block in an array with 5 or more devices (i.e. it reads the old data
> > block and the parity block, subtracts the old from parity and adds the new,
> > then writes both back), RAID6 does not. It always does a reconstruct-write,
> > so on a 6-device RAID6 it will read the other 4 data blocks, compute P and Q,
> > and write them out with the new data.
> > If it did read-modify-write it might be able to get away with reading just P,
> > Q, and the old data block - 3 reads instead of 4.  However subtracting from
> > the Q block is more complicated that subtracting from the P block and has not
> > been implemented.
> 
> OK, I did not know that. In my case I have 8 drives RAID6 degraded to
> 7 drives, so it would be plus to have it implemented the RAID5 way.
> But anyway I was thinking the whole-stripe detection should work in
> this case.
> 
> > But that might not be the issue you are hitting - it simply shows that RAID6
> > is different from RAID5 in important but non-obvious ways.
> >
> > Yes, RAID5 and RAID6 do try to detect whole-stripe write and write them out
> > without reading.  This is not always possible though.
> > Maybe if you told us how many devices were in your arrays (which may be
> > import to understand exactly what is happening), what the chunk size is, and
> > exactly what command you use to write "lots of data".  That might help
> > understand what is happening.
> 
> The RAID5 is 5 drives, the RAID6 arrays are 7 of 8 drives, chunk size
> is 64K. I am using command dd if=/dev/zero of=file bs=X count=Y, it
> behaves the same for bs between 64K to 1 MB. Actually internal read
> speed from every drive is slightly higher that write speed, about cca
> 10%. The ratio between write speed to the array and write speed to
> individual drive is cca 5.5 - 5.7.

I cannot really picture how the read speed can be higher than the write
speed.  The spindle doesn't speed up for reads and slow down for writes does
it?  But that's not really relevant.

A 'dd' with large block size should be a good test.  I just did a simple
experiment.  With a 4-drive non-degraded RAID6 I get about a 1:100 ratio for
reads to writes for an extended write to the filesystem.
If I fail one device it becomes 1:1.  Something certainly seems wrong there.

RAID5 behaves more as you would expect - many more writes than reads.

I've made a note to look into this when I get a chance.

Thanks for the report.

NeilBrown
Attachment:
signature.asc

Description: PGP signature