Re: Feature request: Remove the badblocks list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/9/20 00:50, Roy Sigurd Karlsbakk wrote:
I'm no MD expert, but I there are a couple of things to consider...

1) MD doesn't mark the sector as bad unless we try to write to it, AND
the drive replies to say it could not be written. So, in your case, the
drive is saying that it doesn't have any "spare" sectors left to
re-allocate, we are already passed that point.

2) When MD tries to read, it gets an error, so read from the other
mirror, or re-construct from parity/etc, and automatically attempt to
write to the sector, see point 1 above for the failure case.

So by the time MD gets a write error for a sector, the drive really is
bad, and MD can no longer ensure that *this* sector will be able to
properly store data again (whatever level of RAID we asked for, that
level can't be achieved with one drive faulty). So MD marks it bad, and
won't store any user data in that sector in future. As other drives are
replaced, we mark the corresponding sector on those drives as also bad,
so they also know that no user data should be stored there.

Eventually, we replace the faulty disk, and it would probably be safe to
store user data in the marked sector (assuming the new drive is not
faulty on the same sector, and all other member drives are not faulty on
the same sector).

So, to "fix" this, we just need a way to tell MD to try and write to all
member drives, on all faulty sectors, and if any drive returns fails to
write, then keep the sector as marked bad, if *ALL* drives succeed, then
remove from the bad blocks list on all members.

So why not add this feature to fix the problem, instead of throwing away
something that is potentially useful? Perhaps this could be done as part
of the "repair" mode, or done during a replace/add (when we reach the
"bad" sector, test the new drive, test all existing drives, and then
continue with the repair/add.

Would that solve the "bug"?
I'd better want md to stop fixing "somebody else's problem", that is, the disk, and rather just do its job. As for the case, I have tried to manually read those sectors named in the badblocks list and they all work. All of them. But then, there's no fixing, since they are proclaimed dead. So are their siblings' sectors with the same number, regardless of status.
Just because you can read them, doesn't mean you can write them. Clearly, at some point in time, one of your drives failed. You now need to recover from that failed drive in the most sensible way.
If a drive has multiple issues with bad sector, kick it out. It doesn't have anything to do in the RAID anymore

And if a group of 100 sectors are bad on drive 1, and 100 different sectors on drive 2, you want to kick both drives out, and destroy all your data until you can create a new array and restore from backup?

OR, just mark those parts of all disks faulty, and at some point in the future, you replace the disks, and then find a way to tell MD that the sectors are working now (and preferably, re-test them before marking them as OK)?

BTW, I just found this:

https://raid.wiki.kernel.org/index.php/The_Badblocks_controversy

Which suggests that there is indeed a bug which should be hunted and fixed, and that actually the BBL isn't populated via failed writes, it is populated by failed reads while doing a replace/add, AND the failed read is from the source drive AND the parity/mirror drives.

Either way, perhaps what is needed (if you are interested) is a repeatable test scenario causing the problem, which could then be used to identify and fix the bug.

Regards,
Adam




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux