On 17 January 2012 12:02, Peter Grandi <pg@xxxxxxxxxxxxxxxxxxxx> wrote: > [ ... ] > >>> Why is the system unresponsive, shouldn't it still be OK >>> after a drive failure? > > There is a bit of a difference between a "drive failure" and > some/several bad sectors on a drive. > > It is also to wonder whether the partially defective drive has > been "failed" and "removed" from the MD set and perhaps > "deleted" using '/sys/block/sdb/device/delete'. > >> Hm, I'm seeing this in dmesg, could it be related? (ioctl lock) > >> [425480.928740] md/raid:md0: read error corrected (8 sectors at >> 223617240 on sdb1) > > Note the "read error corrected" (*corrected*) and that is is "8 > sectors" may indicate it is one of the drives with 4096B sectors > that is configured as if it has 512B ones. > > [ ... ] > > Overall it is likely that you have just implicitly discovered > how important short settings for Error Recovery Control are, and > to choose drives that allow you to set them: > > http://www.sabi.co.uk/blog/1103Mar.html#110331 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html $ sudo smartctl -l scterc,20,20 /dev/sdb smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-2-ARCH] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Warning: device does not support SCT Error Recovery Control command :-/ and yes it's a "4KB" drive, a WD20EARS. It failed after almost 11000 hours. Thanks, now I know the reason for the system hang. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html