Peter Rabbitson wrote: > Greetings, > > This is not a strictly raid question, but this is the best list I know > of for this type of questions. Two days ago my server ground to a halt > without apparent reasons. There were tons of processes in D state, with > no signs of any significant work being done. I attributed it to resource > starvation (the server is pretty loaded), rebooted and went on with my > life. > > Yesterday I received the log messages included at the bottom of this > email. Since I am running a --level=10 --raid-devices=4 --layout=f3 I am > not that worried abiut losing data, and decided to investigate. I > removed (mdadm -r) the devices in question from the arrays, power cycled > the server, and executed a full badblocks -svw /dev/sda run. It passed > with flying colors. > > So here is my question - what does the log below signify (there are no > omissions, this is all I got) - is my controller dying? Or is there > indeed a well masked hard drive failure? Should I change the drive, the > controller, or both? Looks to me like a drive failed with a sector problem. Then, quite possibly the sector was re-allocated. What does smartctl -a /dev/sda say? Run man smartctl to ensure you're informed :) Then run: smartctl -t long /dev/sda (you may need smartctl -o on /dev/sda) Depending on the version of smartctl you'll be given a 'poll time' or completion time. It's safe to run smartctl -a /dev/sda early, but make sure the selftest has completed and post the output of that - especially noting any differences to the earlier -a. David -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html