On Wed, 12 Sep 2012 19:49:52 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > Hi Neil, > I have done some more investigation on that. > > I see that according to handle_stripe_dirtying(), raid6 always does > reconstruct-write, while raid5 checks what will be cheaper in terms of > IOs - read-modify-write or reconstruct-write. For example, for a > 3-drive raid5, both are the same, so because of: > > if (rmw < rcw && rmw > 0) > ... /* this is not picked, because in this case rmw==rcw==1 */ > > reconstruct-write is always performed for such 3-drvie raid5. Is this correct? Yes. > > The issue with doing read-modify-writes is that later we have no > reliable way to know whether the parity block is correct - when we > later do reconstruct-write because of a read error, for example. For > read requests we could have perhaps checked the bitmap, and do > reconstruct-write if the relevant bit is not set, but for write > requests the relevant bit will always be set, because it is set when > the write is started. > > I tried the following scenario, which showed a data corruption: > # Create 4-drive raid5 in "--force" mode, so resync starts > # Write one sector on a stripe that resync has not handled yet. RMW is > performed, but the parity is incorrect because two other data blocks > were not taken into account (they contain garbage). > # Induce a read-error on the sector that I just wrote to > # Let resync handle this stripe > > As a result, resync corrects my sector using other two data blocks + > parity block, which is out of sync. When I read back the sector, data > is incorrect. > > I see that I can easily enforce raid5 to always do reconstruct-write, > the same way like you do for raid6. However, I realize that for > performance reasons, it is better to do RMW if possible. > > What do you think about the following rough suggestion: in > handle_stripe_dirtying() check whether resync is ongoing or should be > started - using MD_RECOVERY_SYNC, for example. If there is an ongoing > resync, there is a good reason for that, probably parity on some > stripes is out of date. So in that case, always force > reconstruct-write. Otherwise, count what is cheaper like you do now. > (Can RCW be really cheaper than RMW?) > > So during resync, array performance will be lower, but we will ensure > that all stripe-blocks are consistent. What do you think? I'm fairly sure we used to do that - long long ago. (hunts through git history...) No. The code-fragment was there but it was commented out. I think it would be good to avoid 'rmw' if the sector offset is less than recovery_cp. Care to write a patch? Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature