> Hello all, the discussion about SCTERC boils down to letting the drive attempt ERC a little more or less. For any given disk experience seems to tell the slight difference is, that if ERC is allowed longer you may see the first unrecoverable erros (UREs) just a little (maybe only a month) later. UREs are inevitable. Thus, if I run a filesystem on just a single drive it will get corrupted at some point, nothing to do about it. Wait, except..., use a redundant raid! And here it makes a lot of a difference that the drive's ERC actually terminates before the controller timeout, to not loose all your redundacy again and be in hight risk of UREs showing up during the re-sync. So for a proper comparison we need to look at the difference it makes in the usage scenarios (error delay vs. loosing redundant error resilence + URE triggering), not at the single recoverable/unrecoverable error incidence. It looks to me, that it makes a lot of a differnce to redundant raids and no qualitative difference to single disk filesystems. And we need to keep in mind that single disk filesystems do also depend on the disk to stop grinding away with ERC attempts before the controller timout. Otherwise disk reset may make the system clear buffers and loose open files? Without prolonging the linux default controller timout, SCTERC can prevent that where supported. > in any case the proper place to change the default kernel command > timer value is in the kernel, not with a udev rule. Right. And as you write increasing the controller timout has clear downsides. Noteing as well, as long as the proposed script (a temporary safety measure) maximizes the controller timeout to remedy for disks that don's support SCTERC, this would even fix the timout mismatch for single disk filesystems. (Letting the controller wait until the disk finally succeeds or fails its recovery attempts.) So the proposed script actually provides a case that brings benefit for raid0 setups as well (as long as the linux default is not adaptive to the disk parameters), but increasing the controller timout in all cases would introduce long and unreported i/o blocking into all redundant setups. > I don't know if a udev rule can say "If the drive exclusively uses md, > lvm, btrfs, zfs raid1, 4+ or nested of those, and if the drive does > not support configurable SCT ERC, then change the kernel command timer > for those devices to ~120 seconds" then that might be a plausible > solution to use consumer drives the manufacturer rather explicitly > proscribes from use in raid... The script called by the udev rule could do that, but can be kept as simple as proposed, and can set SCTERC regardles, because setting SCTERC below the controller timout makes a qualitative difference in running the redundant arrays and a marginal difference in running non-redundant filesystems. (And nevertheless, set long controller timout for devices that don's support SCTERC.) After all, this looks like a quite simple change is appropriate: In udev-md-raid-assembly.rules, below LABEL="md_inc" (only handling all md suppported devices) add one rule: # fix timouts for redundant raids, if possible TEST="/usr/sbin/smartctl", ENV{MD_LEVEL}=="raid[1-9]*", RUN+="/usr/bin/mdadm-erc-timout-fix" And in a new /usr/bin/mdadm-erc-timout-fix file implement: if smartctl -l scterc ${HDD_DEV} returns "Disabled" /usr/sbin/smartctl -l scterc,70,70 ${HDD_DEV} else if smartctl -l scterc ${HDD_DEV} does not return "seconds" echo 180 >/sys/block/${HDD_DEV}/device/timeout Regards, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html