On Thu, May 24, 2012 at 6:48 AM, NeilBrown <neilb@xxxxxxx> wrote: > On Wed, 23 May 2012 21:01:09 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >> Hello boys, >> >> I am running some RAID6 arrays in degraded mode, one with >> left-symmetry layout and one with left-symmetry-6 layout. I am >> experiencing (potentially strange) behavior that degrades performance >> of both arrays. >> >> When I am writing sequentially a lot of data to healthy RAID5 array, >> it also reads internally a bit of data. I have data on arrays, so I >> only write through the filesystem. So I am not sure what causing the >> reads, if writing through filesystem potentially causes skipping and >> not writing whole stripes or sometimes timing causes that the whole >> stripe is not written at the same time. But anyway there is only a >> small ratio of reads and the performance is almost OK. >> >> I cant test it with full healthy RAID6 array, because I dont have any >> at the moment. >> >> But when I write sequentially to RAID6 without one drive (again >> through filesystem) I get almost exactly the same amount of internal >> reads as writes. Is it by design and is this expected behaviour? Why >> does it behave like this? It should behave exactly like healthy RAID5, >> it should detect the writing of whole stripe and should not read >> (almost) anything. > > "It should behave exactly like healthy RAID5" > > Why do you say that? Have you examined the code or imagined carefully how > the code would work? > > I think what you meant to say "I expect it would behave exactly like healthy > READ5". That is a much more sensible statement. It is even correct. It > just your expectations that are wrong :-) > (philosophical note: always avoid the word "should" except when applying it > to yourself). What I meant by should was there is theoretical way it can work that way so it should work that way... :) I was implicitly referring to whole stripe write. > Firstly, degraded RAID6 with a left-symmetric layout is quite different from > an optimal RAID5 because there are Q blocks sprinkled around and some D > blocks missing. So there will always be more work to do. > > Degraded left-symmetric-6 is quite similar to optimal RAID5 as the same data > is stored in the same place - so reading should be exactly the same. > However writing is generally different and the code doesn't make any attempt > to notice and optimise cases that happen to be similar to RAID5. Actually I have left-symmetric-6 without one of the "regular" drives not the one with only Qs on it, so it should be similar to degraded RAID6 with a left-symmetric in this regard. > A particular issue is that while RAID5 does read-modify-write when updating a > single block in an array with 5 or more devices (i.e. it reads the old data > block and the parity block, subtracts the old from parity and adds the new, > then writes both back), RAID6 does not. It always does a reconstruct-write, > so on a 6-device RAID6 it will read the other 4 data blocks, compute P and Q, > and write them out with the new data. > If it did read-modify-write it might be able to get away with reading just P, > Q, and the old data block - 3 reads instead of 4. However subtracting from > the Q block is more complicated that subtracting from the P block and has not > been implemented. OK, I did not know that. In my case I have 8 drives RAID6 degraded to 7 drives, so it would be plus to have it implemented the RAID5 way. But anyway I was thinking the whole-stripe detection should work in this case. > But that might not be the issue you are hitting - it simply shows that RAID6 > is different from RAID5 in important but non-obvious ways. > > Yes, RAID5 and RAID6 do try to detect whole-stripe write and write them out > without reading. This is not always possible though. > Maybe if you told us how many devices were in your arrays (which may be > import to understand exactly what is happening), what the chunk size is, and > exactly what command you use to write "lots of data". That might help > understand what is happening. The RAID5 is 5 drives, the RAID6 arrays are 7 of 8 drives, chunk size is 64K. I am using command dd if=/dev/zero of=file bs=X count=Y, it behaves the same for bs between 64K to 1 MB. Actually internal read speed from every drive is slightly higher that write speed, about cca 10%. The ratio between write speed to the array and write speed to individual drive is cca 5.5 - 5.7. I have enough free space on filesystem (ext3) so I guess I should be hitting whole stripes most of the time. Patrik > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html