RAID6, failed device, unresponsive system?

Mathias Burén <mathias.buren@xxxxxxxxx> · Tue, 17 Jan 2012 11:25:07 +0000

Hi list,

This is the setup:

8x 2TB HDDs partitioned 2048 sectors in, then almost filled (left 2GB
or so at the end), so md0 consists of /dev/s[b-h]1 , 64KB chunk size.
I got an email from mdadm saying that sdb1 has failed, so I SSH in.
The array is in "checking" state (guessing it's my weekly cron job
that kicked in), but it's at 38KB/s so I attempt to "echo idle >
md0/sync_action", but that hangs. I decide to check the SMART details
of sdb, and it is indeed very broken, RMA is now in progress. So this
is where I'm at. I've stopped all services relating to the mount point
of md0, and trying to unmount but it doesn't work, it just sits there.
I can't even cat /proc/mdstat , that hangs as well. I'll leave this
for a few hours before I get home. If it's not responsive by the time
I get home (cat /proc/mdstat for example) then I'll just disconnect
the bad HDD and boot to single user mode to check the array status.

Why is the system unresponsive, shouldn't it still be OK after a drive failure?

Best regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html