Re: Suggestion needed for fixing RAID6

MRK <mrk@xxxxxxxxxxxxx> · Mon, 03 May 2010 01:05:05 +0200

On 05/01/2010 11:44 PM, Janos Haar wrote:

But you are right, because the sync_min option not works for 
rebuilding disks, only for resyncing. (it is too smart to do the trick 
for me)

I think... unless bitmaps really do some magic in this, flagging the 
newly introduced disk as more recent than parity data... but do they 
really do this? people correct me if I'm wrong.

Bitmap manipulation should work.
I think i know how to do that, but the data is more important than try 
it on my own.
I want to wait until somebody support this.
... or somebody have another good idea?

Firstly: do you have any backup of your data? If not, before doing any 
experiment I suggest that you back up important stuff. This can be done 
with rsync, and reassembling the array every time it goes down. I 
suggest to put the array in readonly mode (mdadm --readonly /dev/md3): 
this should prevent resyncs from starting automatically, and AFAIR even 
prevent drives being dropped because of read errors (but you can't use 
it during resyncs or rebuilds). Resyncs are bad because they will 
eventually bring down your array. Don't use DM when doing this.

Now, for the real thing, instead of experimenting with bitmaps, I 
suggest you try and see if the normal MD resync works now. If that works 
then you can do the normal rebuild.

*Pls note that: DM should not be needed!* - I know that you have tried 
resyncing with DM COW under MD and that one doesn't work well in this 
case, but in fact DM should not be needed.

We pointed you to DM around Apr 23rd because at that time we thought 
that your drives were dropping for uncorrectable read error, but we had 
guessed wrong.
The general MD phylosophy is that if there is enough parity 
informations, drives are not dropped just for a read error. Upon read 
error MD recomputes the value of the sector from the parity information, 
and then it attempts rewriting the block in place. During this rewrite 
the drive performs a reallocation, moving the block to a hidden spare 
region. If this rewrite fails it means that the drive is out of spare 
sectors and this is considered to be a major failure for MD, and only at 
that point the drive is dropped.
So we thought this was the reason also in your case, but we were wrong, 
in your case it was because of an MD bug, which is the one for which I 
submitted the patch.

So it should work now (without DM). And I think this is the safest thing 
you can try. Having a backup is always better though.

So start the resync without DM and see if it goes through to the end 
without dropping drives. You can use sync_min to cut the dead times.

For max safety you could first try resyncing only one chunk from the 
region of the damaged sectors, so to provoke only a minimum amount of 
rewrites. Set the sync_min to the location of the errors, and sync_max 
to just one chunk above. See what happens...
If it rewrites correctly and the drive is not dropped, then run "check" 
again on the same region and see if "cat /sys/block/md3/md/mismatch_cnt" 
still returns zero (or the value it was before the rewrite). If it is 
zero (or anyway has not changed value) it means the block was really 
rewritten with the correct value: recovery of one sector really works 
for raid6 in singly-degraded state. Then the procedure is safe, as far 
as I understand, and you can go ahead on the other chunks.
When all damaged sectors are reallocated, there are no more read errors, 
and the mismatch_cnt is still at zero, you can go ahead replacing the 
defective drive.

There are a few reasons that can still make the resync fail if we are 
really unlucky, but dmesg should point us to the right direction in that 
case.
Also remember that the patch still needs testing... currently it is not 
really tested because DM drops the drive before MD. We would need to 
know if raid6 is behaving like a raid6 now or it's still behaving like a 
raid5...
Thank you

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html