On Thu, Jun 29, 2017 at 11:52:01AM +0200, Gandalf Corvotempesta wrote: > disk0 has sector X (unused) failed. > It's unused, thus, kernel knows nothing about that and is operting > normally, no warning message or anything. If you don't access sector > X, you wont be notified. > > Now, disk1 hard-fail. You have to replace that. > During the resync, you have to resync the whole array, but disk0, > sectorX is unreadable. > The resync will fail and the whole array is down. > Am I missing something? Not really. It's just that you have to set up the monitoring yourself, whichever way you feel comfortable with. SMART has a selftest feature which causes the disk to read sectors. You can do whole disk at once (long selftest) or in segments (selective selftest). I prefer the selective since that allows you to place the selftest in the time window of least activity. Instead of spending an entire day (or two) testing the whole drive you can put in an hour or two of testing every night and have it cover the entire drive over X days. mdadm can also perform RAID checks, reading everything including parity, RAID layer would attempt to fix read errors then, and you can also check mismatch_cnt. The mdadm checks can also done region by region to distribute load over several days but I think it's still not a direct option for mdadm, the region can be set via /proc or /sys somewhere... Both smartmontools and mdadm should be set up to run such checks periodically, and instantly notify you by email if any problem occurs. If a disk has problems, replace it, otherwise it's a gamble. Whatever promises RAID makes regarding redundancy, it always assumes the other drives to work 100%. It's very unlikely to encounter read errors during rebuild if you ran regular checks and didn't forcibly keep bad drives. Regards Andreas Klauer -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html