Re: Feature request: Remove the badblocks list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/9/20 23:36, Roy Sigurd Karlsbakk wrote:
----- Original Message -----
From: "David C. Rankin" <drankinatty@xxxxxxxxxxxxxxxxxx>
To: "Linux Raid" <linux-raid@xxxxxxxxxxxxxxx>
Sent: Saturday, 22 August, 2020 03:42:40
Subject: Re: Feature request: Remove the badblocks list
On 8/18/20 4:03 PM, Håkon Struijk Holmen wrote:
Hi,

Thanks for the CC, I just managed to get myself subscribed to the list :)

I have gathered some thoughts on the subject as well after reading up on it,
figuring out the actual header format is, and writing a tool [3] to fix my
array...

<snip>
But I have some complaints about the thing..
Well,

  There is code in all things that can be fixed, but I for one will chime in
and say I don't care if a lose a strip or two so long as on a failed disk I
pop the new one in and it rebuilds without issue (which it does, even when the
disk was replaced due to bad blocks)

  So whatever is done, don't fix what isn't broken and introduce more bugs
along the way. If this is such an immediate problem, then why are patches
being attached to the complaints?
The problem is that it's already broken. Take a single mirror. One drive experiences a bad sector, fine, you have redundancy, so you read the data from the other drive and md flags the sector as bad. The drive two is replaced, you lose the data. The new drive will get flagged with the same sector number as faulty, since the first drive has it flagged. So you replace the first drive and during resync, it also gets flagged as having a bad sector. And so on.

Modern (that is, disks since 20 years ago or so) reallocate sectors as they wear out. We have redundancy to handle errors, not to pinpoint them on disks and fill up not-so-smart lists with broken sectors that work. If md sees a drive with excessive errors, that drive should be kicked out, marked as dead, but not interfere with the rest of the raid.

Vennlig hilsen

roy

I'm no MD expert, but I there are a couple of things to consider...

1) MD doesn't mark the sector as bad unless we try to write to it, AND the drive replies to say it could not be written. So, in your case, the drive is saying that it doesn't have any "spare" sectors left to re-allocate, we are already passed that point.

2) When MD tries to read, it gets an error, so read from the other mirror, or re-construct from parity/etc, and automatically attempt to write to the sector, see point 1 above for the failure case.

So by the time MD gets a write error for a sector, the drive really is bad, and MD can no longer ensure that *this* sector will be able to properly store data again (whatever level of RAID we asked for, that level can't be achieved with one drive faulty). So MD marks it bad, and won't store any user data in that sector in future. As other drives are replaced, we mark the corresponding sector on those drives as also bad, so they also know that no user data should be stored there.

Eventually, we replace the faulty disk, and it would probably be safe to store user data in the marked sector (assuming the new drive is not faulty on the same sector, and all other member drives are not faulty on the same sector).

So, to "fix" this, we just need a way to tell MD to try and write to all member drives, on all faulty sectors, and if any drive returns fails to write, then keep the sector as marked bad, if *ALL* drives succeed, then remove from the bad blocks list on all members.

So why not add this feature to fix the problem, instead of throwing away something that is potentially useful? Perhaps this could be done as part of the "repair" mode, or done during a replace/add (when we reach the "bad" sector, test the new drive, test all existing drives, and then continue with the repair/add.

Would that solve the "bug"?

PS, As you noted, if MD gets repeated write errors for one drive, then it will be kicked out. That value is configurable.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux