On Thu, Apr 1, 2021 at 5:53 AM Patrick O'Callaghan <pocallaghan@xxxxxxxxx> wrote: > > On Wed, 2021-03-31 at 18:00 -0600, Chris Murphy wrote: > > Nothing to add but the usual caveats: > > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch > > That´s pretty scary, though the drives I´m using are 1TB units > scavenged from my extinct NAS so are unlikely to be SMR. They´re both > WD model WD10EZEX drives. It's not an SMR concern, it's making sure the drive gives up on errors faster than the kernel tries to reset due to what it thinks is a hanging drive. smartctl -l scterc /dev/sdX That'll tell you the default setting. I'm pretty sure Blues come with SCT ERC disabled. Some support it. Some don't. If it's supported you'll want to set it for something like 70-100 deciseconds (the units SATA drives use for this feature). And yeah, linux-raid@ list is chock full of such misconfigurations. It filters out all the lucky people, and the unlucky people end up on the list with a big problem which generally looks like this: one dead drive, and one of the surviving drives with one bad sector that was never fixed up through normal raid bad sector recovery mechanism, because the kernel's default is to be impatient and do a link reset on consumer drives that overthink a simple problem. Upon link reset, the entire command queue in the drive is lost, and now there's no way to know what sector it was hanging on, and no way for raid to do a fixup. The fixup mechanism is, the drive reports an uncorrectable read error with a sector address *only once it gives up*. And then the md raid (and btrfs and zfs) can go lookup that sector, find out what data is on it, go find its mirror, read the good data, and overwrite the bad sector with good data. The overwrite is what fixes the problem. If the drive doesn't support SCT ERC, we have to get the kernel to be more patient. That's done via sysfs. > > > I use udev for that instead of init scripts. Concept is the same > > though, you want SCT ERC time to be shorter than kernel's command > > timer. > > I´ve been using MD for a while and haven´t seen any errors so far. And you may never see it. Or you may end up being an unlucky person with a raid who experiences complete loss of the array. When I say comes up all the time on linux-raid@ list, it's about once every couple of weeks. It's seen most often with raid5 because it has more drives, thus more failures, than raid1 setups. And tolerates only one failure *in a stripe*. Most everyone considers a failure a complete drive failure, but drives also partially fail. Two drives partially failing the sectors in the same stripe is pretty astronomical. But if one drive dies, and *any* of the remaining drives has a bad sector that can't be read, the entire stripe is lost. And depending on what's in that stripe, it can bring down the array. So, what you want is for the drives to report their errors, rather than the kernel doing link resets. -- Chris Murphy _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure