Re: RAID6, failed device, unresponsive system?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17 January 2012 11:25, Mathias Burén <mathias.buren@xxxxxxxxx> wrote:
> Hi list,
>
> This is the setup:
>
> 8x 2TB HDDs partitioned 2048 sectors in, then almost filled (left 2GB
> or so at the end), so md0 consists of /dev/s[b-h]1 , 64KB chunk size.
> I got an email from mdadm saying that sdb1 has failed, so I SSH in.
> The array is in "checking" state (guessing it's my weekly cron job
> that kicked in), but it's at 38KB/s so I attempt to "echo idle >
> md0/sync_action", but that hangs. I decide to check the SMART details
> of sdb, and it is indeed very broken, RMA is now in progress. So this
> is where I'm at. I've stopped all services relating to the mount point
> of md0, and trying to unmount but it doesn't work, it just sits there.
> I can't even cat /proc/mdstat , that hangs as well. I'll leave this
> for a few hours before I get home. If it's not responsive by the time
> I get home (cat /proc/mdstat for example) then I'll just disconnect
> the bad HDD and boot to single user mode to check the array status.
>
> Why is the system unresponsive, shouldn't it still be OK after a drive failure?
>
>
> Best regards,
> Mathias

Hm, I'm seeing this in dmesg, could it be related? (ioctl lock)


[425480.928740] md/raid:md0: read error corrected (8 sectors at
223617240 on sdb1)
[425480.928748] md/raid:md0: read error corrected (8 sectors at
223617248 on sdb1)
[425480.928756] md/raid:md0: read error corrected (8 sectors at
223617256 on sdb1)
[425480.928764] md/raid:md0: read error corrected (8 sectors at
223617264 on sdb1)
[425480.928771] md/raid:md0: read error corrected (8 sectors at
223617272 on sdb1)
[425480.928779] md/raid:md0: read error corrected (8 sectors at
223617280 on sdb1)
[425480.928787] md/raid:md0: read error corrected (8 sectors at
223617288 on sdb1)
[468280.254190] md: ioctl lock interrupted, reason -4, cmd -2142762735
[469362.614553] nfsd: last server has exited, flushing export cache
[471485.048791] md: ioctl lock interrupted, reason -4, cmd -2142762735

/M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux