Re: FW: change in disk failure policy for non-BBL arrays?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Note that this only affects arrays on which the bad block table has been disabled.  The BBT is enabled by default, so this is only an issue for you if you have actively disabled the BBT (assembling with the -U no-bbl option, for example).

Thanks,
Chris



On 11/3/17 4:19 PM, Sarah Newman wrote:
On 11/03/2017 12:58 PM, Chris Walker wrote:
Hello,
I was looking at this again today and it appears that with this change, error handling no longer works correctly in RAID10 (I haven't checked the other levels yet).  Without a BBL configured, an error cycles through fix_read_error until max_read_errors is exceeded, and only then is the drive kicked out of the array.  For example, if I inject errors in response to both read and write commands at sector 16392 of /dev/sda, logs in response to a read of the corresponding md0 sector look like:
(many repeats)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: read correction write failed (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: unable to read back corrected sectors (8 sectors at 16392 on sda)
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: failing drive
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Raid device exceeded read_error threshold [cur 21:max 20]
Oct 27 16:15:16 c1 kernel: md/raid10:md0: sda: Failing raid device
Oct 27 16:15:16 c1 kernel: md/raid10:md0: Disk failure on sda, disabling device.

Previously, the drive would have been failed out of the array by the call of md_error at the end of r10_sync_page_io.

Is there an appetite for a patch that takes the easy way out by reverting to the previous behavior with changes like

-       if (!rdev_set_badblocks(rdev, sector, sectors, 0))
+       if (!rdev_set_badblocks(rdev, sector, sectors, 0) || rdev->badblocks.shift < 0)

Thanks,
Chris

As a RAID10 user that seems like the right thing to do, thank you.

--Sarah

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux