Re: Feature request: Remove the badblocks list

Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx> · Wed, 2 Sep 2020 16:50:37 +0200 (CEST)

> I'm no MD expert, but I there are a couple of things to consider...
> 
> 1) MD doesn't mark the sector as bad unless we try to write to it, AND
> the drive replies to say it could not be written. So, in your case, the
> drive is saying that it doesn't have any "spare" sectors left to
> re-allocate, we are already passed that point.
> 
> 2) When MD tries to read, it gets an error, so read from the other
> mirror, or re-construct from parity/etc, and automatically attempt to
> write to the sector, see point 1 above for the failure case.
> 
> So by the time MD gets a write error for a sector, the drive really is
> bad, and MD can no longer ensure that *this* sector will be able to
> properly store data again (whatever level of RAID we asked for, that
> level can't be achieved with one drive faulty). So MD marks it bad, and
> won't store any user data in that sector in future. As other drives are
> replaced, we mark the corresponding sector on those drives as also bad,
> so they also know that no user data should be stored there.
> 
> Eventually, we replace the faulty disk, and it would probably be safe to
> store user data in the marked sector (assuming the new drive is not
> faulty on the same sector, and all other member drives are not faulty on
> the same sector).
> 
> So, to "fix" this, we just need a way to tell MD to try and write to all
> member drives, on all faulty sectors, and if any drive returns fails to
> write, then keep the sector as marked bad, if *ALL* drives succeed, then
> remove from the bad blocks list on all members.
> 
> So why not add this feature to fix the problem, instead of throwing away
> something that is potentially useful? Perhaps this could be done as part
> of the "repair" mode, or done during a replace/add (when we reach the
> "bad" sector, test the new drive, test all existing drives, and then
> continue with the repair/add.
> 
> Would that solve the "bug"?

I'd better want md to stop fixing "somebody else's problem", that is, the disk, and rather just do its job. As for the case, I have tried to manually read those sectors named in the badblocks list and they all work. All of them. But then, there's no fixing, since they are proclaimed dead. So are their siblings' sectors with the same number, regardless of status.

If a drive has multiple issues with bad sector, kick it out. It doesn't have anything to do in the RAID anymore

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.