Re: Fwd: Help with failed RAID-5 -> 6 migration

Phil Turmel <philip@xxxxxxxxxx> · Tue, 11 Jun 2013 06:44:14 -0400

On 06/10/2013 10:08 PM, Keith Phillips wrote:
> Hi  Phil,
> 
>> A big stack trace suggests other problems in your system.  Not that you
>> don't have potential I/O error issues, but there might be a kernel problem.
>>
>> Please show "uname -a" and "mdadm --version".
> 
> These are the verisons I currently have, which the migration was
> attempted with. The array was originally constructed years ago,
> probably with older kernel/mdadm versions:
> 
> Linux muncher 3.0.0-32-server #51-Ubuntu SMP Thu Mar 21 16:09:49 UTC
> 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> mdadm - v3.1.4 - 31st August 2010

If the recommendations below don't help, consider using a modern liveCD
to complete the reshape.  I use SystemRescueCD myself, but I'm sure
others would do fine, too.

>> The key thing to look for is a nonzero mismatch count in sysfs for that
>> array.  I'm not familiar with Ubuntu's script, so you might want to look
>> by hand at some future point.
> 
> I'll have a look in future. I do also have mdadm running daily via
> cron with "--monitor --oneshot" - do you know if this checks the
> "mismatch_cnt" file and reports errors?

I don't think so.

>>> Also, while poking yesterday I noticed I was getting warnings of the
>>> form "Device has wrong state in superblock but /dev/sde seems ok", so
>>> I tried a forced assemble:
>>> mdadm --assemble /dev/md0 --force
>>>
>>> Looks like it updated some info in the superblocks (and yes, I forgot
>>> to save the original output first!), but the array remains inactive. I
>>> have now sworn off poking around by myself, because I've no idea what
>>> to do from here.
>>
>> Please show /proc/mdstat again, along with "mdadm -D /dev/md0".
> 
> ---------------------------
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sde[4] sdc[1] sdb[0] sdd[3]
>       7814054240 blocks super 1.2
> 
> unused devices: <none>
> ---------------------------
> /dev/md0:
>         Version : 1.2
>   Creation Time : Sun Jul 17 00:41:57 2011
>      Raid Level : raid6
>   Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
>    Raid Devices : 4
>   Total Devices : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Sat Jun  8 11:00:43 2013
>           State : active, degraded, Not Started
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 1
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>      New Layout : left-symmetric
> 
>            Name : muncher:0  (local to host muncher)
>            UUID : 830b9ec8:ca8dac63:e31946a0:4c76ccf0
>          Events : 50599
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       16        0      active sync   /dev/sdb
>        1       8       32        1      active sync   /dev/sdc
>        3       8       48        2      active sync   /dev/sdd
>        4       8       64        3      spare rebuilding   /dev/sde
> ---------------------------
> 
>>> for x in /sys/block/sd[acde]/device/timeout ; do echo $x $(< $x) ; done
>>> ----------------------------
>>> /sys/block/sdb/device/timeout 30
>>> /sys/block/sdc/device/timeout 30
>>> /sys/block/sdd/device/timeout 30
>>> /sys/block/sde/device/timeout 30
>>
>> Due to your green drives, you cannot leave these timeouts at 30 seconds.
>>  I recommend 180 seconds:
>>
>> for x in /sys/block/sd[bcde]/device/timeout ; do echo 180 >$x ; done
>>
>> (You should do this ASAP.  On the run is fine.)
>>
>> You will need your system to do this at every boot.  Most distros have
>> rc.local or a similar scripting mechanism you can use.
>>
>> Phil
> 
> Done - thanks for the tip.

Given the above data, I believe you should be able to just do "mdadm
/dev/md0 --run" and watch it recover.

If it still gives you trouble, stop the array and reassemble with "-vv"
and show what it reports.

Also report any dmesg errors.

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html