Re: Advice recovering from interrupted grow on RAID5 array

John Yates <jyates65@xxxxxxxxx> · Mon, 21 Oct 2013 16:06:27 -0400

On Mon, Oct 21, 2013 at 12:29 PM, John Yates <jyates65@xxxxxxxxx> wrote:
> On Sun, Oct 20, 2013 at 9:09 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> On Thu, 17 Oct 2013 01:36:28 -0400 John Yates <jyates65@xxxxxxxxx> wrote:
>>
>>> On Wed, Oct 16, 2013 at 8:07 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>> > On Wed, 16 Oct 2013 09:02:52 -0400 John Yates <jyates65@xxxxxxxxx> wrote:
>>> >
>>> >> On Wed, Oct 16, 2013 at 1:26 AM, NeilBrown <neilb@xxxxxxx> wrote:
>>> >> > On Mon, 14 Oct 2013 21:59:45 -0400 John Yates <jyates65@xxxxxxxxx> wrote:
>>> >> >
>>> >> >> Midway through a RAID5 grow operation from 5 to 6 USB connected
>>> >> >> drives, system logs show that the kernel lost communication with some
>>> >> >> of the drive ports which has left my array in a state that I have not
>>> >> >> been able to reassemble. After reseating the cable connections and
>>> >> >> rebooting, all of the drives appear to be functioning normally, so
>>> >> >> hopefully the data is still intact. I need advice on recovery steps
>>> >> >> for the array.
>>> >> >>
>>> >> >> It appears that each drive failed in quick succession with /dev/sdc1
>>> >> >> being the last standing and having the others marked as missing in its
>>> >> >> superblock. The superblocks of the other drives show all drives as
>>> >> >> available. (--examine output below)
>>> >> >>
>>> >> >> >mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
>>> >> >> mdadm: too-old timestamp on backup-metadata on device-5
>>> >> >> mdadm: If you think it is should be safe, try 'export MDADM_GROW_ALLOW_OLD=1'
>>> >> >> mdadm: /dev/md127 assembled from 1 drives - not enough to start the array.
>>> >> >
>>> >> > Did you try following the suggestion and run
>>> >> >
>>> >> >  export MDADM_GROW_ALLOW_OLD=1
>>> >> >
>>> >> > and the try the --asssemble again?
>>> >> >
>>> >> > NeilBrown
>>> >>
>>> >> Yes I did, thanks. Not much change though. It accepts the timestamp,
>>> >> but then appears not to use it.
>>> >>
>>> >> mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>> >> /dev/sdf1 /dev/sdg1 --verbose
>>> >> mdadm: looking for devices for /dev/md127
>>> >> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>>> >> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>>> >> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>>> >> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>>> >> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>>> >> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>>> >> mdadm: :/dev/md127 has an active reshape - checking if critical
>>> >> section needs to be restored
>>> >> mdadm: accepting backup with timestamp 1381360844 for array with
>>> >> timestamp 1381729948
>>> >> mdadm: backup-metadata found on device-5 but is not needed
>>> >> mdadm: added /dev/sdf1 to /dev/md127 as 1
>>> >> mdadm: added /dev/sdd1 to /dev/md127 as 2
>>> >> mdadm: added /dev/sdc1 to /dev/md127 as 3
>>> >> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>>> >> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>>> >> mdadm: added /dev/sde1 to /dev/md127 as 0
>>> >> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>>> >
>>> >
>>> > What about with MDADM_GROW_ALLOW_OLD=1 *and* --force ??
>>> >
>>> > If that doesn't work, please add --verbose as well, and report the output.
>>> >
>>> > NeilBrown
>>>
>>> Thanks Neil. I had tried that as well (output below). I'm wondering if
>>> there is a way to fix the metadata for /dev/sdc1 since that seems to
>>> be the odd one where the --examine data indicates that the other disks
>>> are all bad when I don't believe they really are (just the result of a
>>> partial kernel or driver crash). I have read about some people zeroing
>>> the superblock on a device so that it can be recreated, but I am not
>>> sure exactly how that works and am hesitant to try it since a reshape
>>> was in progress. I have also read about people having had success by
>>> re-running the original mdadm --create while leaving the data intact,
>>> but again I am hesitant to try that, especially because of the reshape
>>> state.
>>>
>>> Or... maybe this all has more to do with the Update Time, since the
>>> output seems to indicate 4 drives are usable. All of the drives have
>>> the same Update Time except for /dev/sdc1 which is about 5 minutes
>>> later than the rest. Since it is the fourth device, perhaps the
>>> assemble is satisfied with devices 0, 1, 2, 3, but then seeing an
>>> Update Time on devices 4 and 5 that is earlier than device 3, it
>>> marks them as "possibly out of date" and stops trying to assemble the
>>> array. Hard to tell, but I still would not have any idea how to
>>> overcome that scenario. I appreciate your help!
>>>
>>> # export MDADM_GROW_ALLOW_OLD=1
>>> # mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>> /dev/sdf1 /dev/sdg1 --force --verbose
>>> mdadm: looking for devices for /dev/md127
>>> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
>>> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
>>> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
>>> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
>>> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
>>> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
>>> mdadm: :/dev/md127 has an active reshape - checking if critical
>>> section needs to be restored
>>> mdadm: accepting backup with timestamp 1381360844 for array with
>>> timestamp 1381729948
>>> mdadm: backup-metadata found on device-5 but is not needed
>>> mdadm: added /dev/sdf1 to /dev/md127 as 1
>>> mdadm: added /dev/sdd1 to /dev/md127 as 2
>>> mdadm: added /dev/sdc1 to /dev/md127 as 3
>>> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
>>> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
>>> mdadm: added /dev/sde1 to /dev/md127 as 0
>>> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>>
>> That shouldn't happen.  With '-f' it should force the event count of either b1
>> or g1 (or maybe both) to match the others.
>>
>> What version of mdadm are you using? (mdadm -V)
>>
>
> mdadm - v3.3 - 3rd September 2013
> (Arch Linux)
>
>> Maybe try the latest
>>   git clone git://git.neil.brown.name/mdadm
>>   cd mdadm
>>   make mdadm
>>   ./mdadm .....
>>
>> NeilBrown
>
> OK, trying the latest...
>
> # ./mdadm -V
> mdadm - v3.3-27-ga4921f3 - 16th October 2013
>
> # uname -rv
> 3.11.4-1-ARCH #1 SMP PREEMPT Sat Oct 5 21:22:51 CEST 2013
>
> No change in the result and I don't see errors anywhere indicating a
> problem writing to /dev/sdb1 or /dev/sdg1. Are there any more debug
> options that I am overlooking?
>
> # ./mdadm --assemble /dev/md127 /dev/sdb1 /dev/sdc1 /dev/sdd1
> /dev/sde1 /dev/sdf1 /dev/sdg1 -f -v
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdf1 is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg1 is identified as a member of /dev/md127, slot 5.
> mdadm: :/dev/md127 has an active reshape - checking if critical
> section needs to be restored
> mdadm: accepting backup with timestamp 1381360844 for array with
> timestamp 1381729948
> mdadm: backup-metadata found on device-5 but is not needed
> mdadm: added /dev/sdf1 to /dev/md127 as 1
> mdadm: added /dev/sdd1 to /dev/md127 as 2
> mdadm: added /dev/sdc1 to /dev/md127 as 3
> mdadm: added /dev/sdb1 to /dev/md127 as 4 (possibly out of date)
> mdadm: added /dev/sdg1 to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sde1 to /dev/md127 as 0
> mdadm: /dev/md127 assembled from 4 drives - not enough to start the array.
>
> # ./mdadm --examine /dev/sd[bcdefg]1 | egrep '/dev/sd|Events|Update|Role|State'
> /dev/sdb1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155279
>    Device Role : Active device 4
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdc1:
>           State : clean
>     Update Time : Mon Oct 14 01:57:26 2013
>          Events : 155281
>    Device Role : Active device 3
>    Array State : ...A.. ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdd1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 2
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sde1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 0
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdf1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155281
>    Device Role : Active device 1
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> /dev/sdg1:
>           State : clean
>     Update Time : Mon Oct 14 01:52:28 2013
>          Events : 155279
>    Device Role : Active device 5
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>
>
>
> Not sure is this is significant but at boot time they are all shown as
> spares though the indexing seems odd in that index 2 is skipped:
>
> # cat /proc/mdstat
> Personalities :
> md127 : inactive sdf1[1](S) sde1[0](S) sdg1[6](S) sdd1[3](S)
> sdb1[5](S) sdc1[4](S)
>       11717972214 blocks super 1.2
>
> unused devices: <none>
>
>
> Then I do an `mdadm --stop /dev/md127` before trying the assemble.

OK, I got the array started and is has resumed reshaping.

Line 806 of Assemble.c:
for (i = 0; i < content->array.raid_disks && i < bestcnt; i++) {

'bestcnt' appears to be an index into the list of available devices,
including non-array members. The loop condition here limits iteration
to the number of devices in the array. In my array, there are some
non-member devices early in the list, so later members are not
considered for updating. Perhaps the 'i < content->array.raid_disks'
condition is not needed here?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html