On Sun, 5 Oct 2014 11:27:01 +0200 Matthijs Kooijman <matthijs@xxxxxxxx> wrote: > Hey folks, > > a few times now I've found my system being locked up after a read error > from a hard disk. It looks like the MD code that handles the read error > messes up and causes a GPF. > > After this happens, the system becomes completely unresponsive - it > responds to ping and opens TCP connections, but no data comes out. The > serial console also gives no response. > > After rebooting, the disk in question showed a pending sector. In the > most recent occurence, I found that the array was also resyncing. I'm > not sure if this also happened in the earlier occurences (but since the > pending sector didn't disappear in the next day, I'd expect no resync > happened before). The read errors always happened during a routine check > of the array. That last sentence is the clue I needed - thanks. You are running 3.10.3. It contains a bug that was fixed in 3.10.32. http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=9f2d289933e60ec726a7a9522e2dcdfdc82c58de So if you upgrade your kernel, the problem should be gone. NeilBrown
Attachment:
pgpJFnu9FfoiU.pgp
Description: OpenPGP digital signature