Re: Sequential writing to degraded RAID6 causing a lot of reading

NeilBrown <neilb@xxxxxxx> · Thu, 24 May 2012 14:48:22 +1000

On Wed, 23 May 2012 21:01:09 +0200 Patrik Horník <patrik@xxxxxx> wrote:

> Hello boys,
> 
> I am running some RAID6 arrays in degraded mode, one with
> left-symmetry layout and one with left-symmetry-6 layout. I am
> experiencing (potentially strange) behavior that degrades performance
> of both arrays.
> 
> When I am writing sequentially a lot of data to healthy RAID5 array,
> it also reads internally a bit of data. I have data on arrays, so I
> only write through the filesystem. So I am not sure what causing the
> reads, if writing through filesystem potentially causes skipping and
> not writing whole stripes  or sometimes timing causes that the whole
> stripe is not written at the same time. But anyway there is only a
> small ratio of reads and the performance is almost OK.
> 
> I cant test it with full healthy RAID6 array, because I dont have any
> at the moment.
> 
> But when I write sequentially to RAID6 without one drive (again
> through filesystem) I get almost exactly the same amount of internal
> reads as writes. Is it by design and is this expected behaviour? Why
> does it behave like this? It should behave exactly like healthy RAID5,
> it should detect the writing of whole stripe and should not read
> (almost) anything.

"It should behave exactly like healthy RAID5"

Why do you say that?  Have you examined the code or imagined carefully how
the code would work?

I think what you meant to say "I expect it would behave exactly like healthy
READ5".  That is a much more sensible statement.  It is even correct.  It
just your expectations that are wrong :-)
(philosophical note: always avoid the word "should" except when applying it
to yourself).

Firstly, degraded RAID6 with a left-symmetric layout is quite different from
an optimal RAID5 because there are Q blocks sprinkled around and some D
blocks missing.  So there will always be more work to do.

Degraded left-symmetric-6 is quite similar to optimal RAID5 as the same data
is stored in the same place - so reading should be exactly the same.
However writing is generally different and the code doesn't make any attempt
to notice and optimise cases that happen to be similar to RAID5.

A particular issue is that while RAID5 does read-modify-write when updating a
single block in an array with 5 or more devices (i.e. it reads the old data
block and the parity block, subtracts the old from parity and adds the new,
then writes both back), RAID6 does not. It always does a reconstruct-write,
so on a 6-device RAID6 it will read the other 4 data blocks, compute P and Q,
and write them out with the new data.
If it did read-modify-write it might be able to get away with reading just P,
Q, and the old data block - 3 reads instead of 4.  However subtracting from
the Q block is more complicated that subtracting from the P block and has not
been implemented.

But that might not be the issue you are hitting - it simply shows that RAID6
is different from RAID5 in important but non-obvious ways.

Yes, RAID5 and RAID6 do try to detect whole-stripe write and write them out
without reading.  This is not always possible though.
Maybe if you told us how many devices were in your arrays (which may be
import to understand exactly what is happening), what the chunk size is, and
exactly what command you use to write "lots of data".  That might help
understand what is happening.

NeilBrown
Attachment:
signature.asc

Description: PGP signature