Re: RAID 6 Not Mounting (Block device is empty)

Phil Turmel <philip@xxxxxxxxxx> · Sat, 7 Nov 2015 16:08:44 -0500

On 11/07/2015 02:17 PM, Francisco Parada wrote:
> Hi Phil,
> 
> First, I want to thank you for taking the time to reply to me, I truly appreciate it.  Secondly, I must correct my statement “I added two new arrays to my system last night” … I had started writing this email a few nights ago, and shut down the system in order to prevent me from getting frustrated and doing something stupid.  So I walked away from all of it, and just forgot to amend my original email.  So the system was off for a few days, and I turned it back on a few minutes before sending my email, and dmesg only shows today’s output, and interestingly enough, no timestamps are on there.

That's ok.  I was looking at the last update times on your mdadm -E reports.

>> 1) the dmesg from the time around the event, +/- a few minutes.
> 
> Having said that, dmesg isn’t showing me anything from that day either, and I just found out that /var/log/messages doesn’t even exist in my Ubuntu Server 15.04.  It seems I have to enable that, so that’s one more thing I’m about to do now.  Are there any other ways I could possibly retrieve that?  I’m afraid the answer will be a solid “no”, but worth asking.

Hmm.  Did Ubuntu switch to systemd for that version?  If so, you'll need to use journalctl.  I'm only now learning that, so you'll have to research the options you need yourself.  Or others here on the list will chime in :-)

>> 2) the output of the following drive diagnostics:
>>
>> for x in /dev/sd[a-z] ; do echo $x ; smartctl -i -A -l scterc $x ; done

You cut off the "echo $x" part.  I very much wanted that to document which drives were which serial numbers.

If you want to be pedantic, combine mdadm -E and smartctl instead:

for x in /dev/sd[a-z] ; do mdadm -E $x ; smartctl -i -A -l scterc $x ; done

However, it is clear from these reports that you are in fact suffering from timeout mismatch:

> SCT Error Recovery Control:
>            Read: Disabled

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

> SCT Error Recovery Control command not supported

You will need to apply the workarounds for these drives.  The one with scterc disabled is a raid-capable drive that just powers up in desktop mode.  Add "smartctl -l scterc,70,70 /dev/sdX" to your boot scripts for that one.  For the others, you will need to set a long timeout.  For now, before any more mdadm operations, just use the blanket work-around script:

for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

Now, I didn't see any "Current Pending Sector" counts, so I don't think you are suffering from UREs.  In fact, with four of your drives dying together, I suspect you overloaded your power supply(ies) with the extra arrays, either electrically or thermally.  The power was OK for idle and trivial operations but couldn't handle the load while copying.

Backup across a gigabit lan if you can't get all the necessary drives into the main case.

> ==========================================================================================
> 
>> Do *not* perform any --create operation on your array.
> No worries, I’m not touching that.  Thank you for your input.

At this point, I'm confident that your complete set of original drives should just be forcibly assembled:

mdadm -Afv /dev/mdX /dev/sd[b-h]

Replace the device letters if they've changed since your mdadm -E reports.

There might be minor filesystem damage if any blocks were in flight to the array when it died.  fsck, then mount.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html