Re: Please advise, strange "not enough to start the array while not clean"

Patrik Horník <patrik@xxxxxx> · Sun, 5 Oct 2014 17:40:36 +0200

Hello,

so it worked as we expected and it was most probably caused by that bug.

Thanks for the assistance.

Patrik

2014-09-22 12:17 GMT+02:00 NeilBrown <neilb@xxxxxxx>:
> On Mon, 22 Sep 2014 09:20:35 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>
>> Well I browsed through the sources of latest mdadm version at night
>> instead of sleeping :) and was searching how it got clean flag set to
>> 0. And was not sure exactly about that line there and from which
>> device it gets clean state. So the bug was that it can get it from not
>> current device? It makes sense because first device identified by
>> mdadm is old md101 device.
>
> Correct.
>
>>
>> So will it work then if I use 3.3 and somehow dont give it md101
>> device? By stopping it before -A call or by manually specifying other
>> drives? Or you really recommend to build latest version of mdadm?
>
>   mdadm -S /dev/mdXX list-of-devices-that-are-working
>
> should start the array for you using 3.3.
> Then
>    mdadm /dev/mdXX --re-add /dev/md101
>
> will re-add the device and hopefully do a quick bitmap-based rebuild.
>
>>
>> What is expected behaviour with 3.3.1+? Can it be started with all
>> devices and it should automatically start to recover md101? If so what
>> is best way, to start it without md101 and then use -re-add to add it,
>> start it without md101 and use -add or start it also with md101?
>
> --add should have the same effect as --re-add.
>
> I think mdadm 3.3.1 will just assemble the array without md101 and you then
> have to --re-add that yourself.  To get it re-added automatically you needed
> to have "policy action=re-add" or similar in mdadm.conf, and then use
>   mdadm -I /dev/devicename
> on each device.  That should put the whole array together and re-add anything
> that needs it.
>
> I think.
>
> NeilBrown
>
>
>>
>> Thank you very much.
>>
>> 2014-09-22 8:56 GMT+02:00 NeilBrown <neilb@xxxxxxx>:
>> > On Mon, 22 Sep 2014 08:34:21 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>> >
>> >> - Well what is exact meaning of --no-degraded then? Because I am using
>> >> it also on RAID6 arrays that are missing one drive and mdadm starts
>> >> them. I thought until today that it is against assembling for example
>> >> RAID6 array with missing more than two drives or to be more precise
>> >> array with number of drives it used last time. (I did not look at the
>> >> code what does it exactly. It is mdadm 3.3 on Debian.)
>> >
>> > Sorry, I confused myself.
>> > "--no-degraded" means "Only start the array if all expected devices are
>> > present".
>> > So if the array "knows" that one device is missing, it will start if all
>> > other devices are present.  But if it "thinks" that all devices are working,
>> > then it will only start if all the devices ar there.
>> >
>> >>
>> >> - Well array was shutdown cleanly manually by mdadm -S. Cant the not
>> >> clean classification be result of md101 device between find devices or
>> >> result of first two assemble tries?
>> >
>> > If the state still says "Clean" (which it does, thanks), the mdadm should
>> > treat it as 'clean'.
>> >
>> > I think you are probably hitting the bug fixed by
>> >
>> >  http://git.neil.brown.name/?p=mdadm.git;a=commitdiff;h=56bbc588f7f0f3bdd3ec23f02109b427c1d3b8f1
>> >
>> > which is in 3.3.1.
>> >
>> > So a new version of mdadm should fix it.
>> >
>> > NeilBrown
>> >
>> >
>> >
>> >>
>> >> - Anyway as I mentioned superblock on all five devices has clean state. Example:
>> >> /dev/sdk1:
>> >>           Magic : XXXXXXX
>> >>         Version : 1.2
>> >>     Feature Map : 0x1
>> >>      Array UUID : XXXXXXXXXXXXXXXXXXXXX
>> >>            Name :
>> >>   Creation Time : Thu Aug XXXXXXXX
>> >>      Raid Level : raid6
>> >>    Raid Devices : 6
>> >>
>> >>  Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
>> >>      Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
>> >>   Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
>> >>     Data Offset : 262144 sectors
>> >>    Super Offset : 8 sectors
>> >>    Unused Space : before=262056 sectors, after=911 sectors
>> >>           State : clean
>> >>     Device UUID : YYYYYYYYYYYYYYYYYYYY
>> >>
>> >> Internal Bitmap : 8 sectors from superblock
>> >>     Update Time : Mon Sep 22 02:23:45 2014
>> >>   Bad Block Log : 512 entries available at offset 72 sectors
>> >>        Checksum : ZZZZZZZZ - correct
>> >>          Events : EEEEEE
>> >>
>> >>          Layout : left-symmetric
>> >>      Chunk Size : 512K
>> >>
>> >>    Device Role : Active device 4
>> >>    Array State : AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>> >>
>> >> - md101 has Events count lower by 16 than others devices.
>> >>
>> >> - Please I need little more assurance what is exact state of array and
>> >> explain why it is behaving as it is behaving, so I can be sure what
>> >> steps are needed and what happens. The data on array is important.
>> >> Patrik Horník
>> >> šéfredaktor www.DSL.sk
>> >> Tel.: +421 905 385 666
>> >> Email: patrik@xxxxxx
>> >>
>> >>
>> >> 2014-09-22 5:19 GMT+02:00 NeilBrown <neilb@xxxxxxx>:
>> >> > On Mon, 22 Sep 2014 04:11:20 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>> >> >
>> >> >> Hello Neil,
>> >> >>
>> >> >> I've got this situation unfamiliar to me on RAID6 array md1 with important data.
>> >> >>
>> >> >> - It is RAID6 with 6 devices, 5 are partitions and 1 is another RAID0
>> >> >> array md101 from two smaller drives. One of the smaller drives froze,
>> >> >> so md101 got kicked out from md1 and marked as faulty in md1. After
>> >> >> while I've stopped md1 without removing md101 from it first. Then I
>> >> >> rebooted and assembled md101.
>> >> >>
>> >> >> - First I tried mdadm -A --no-degraded -u UUID /dev/md1 but got
>> >> >> "mdadm: /dev/md1 assembled from 5 drives (out of 6), but not started."
>> >> >> so I stopped the md1.
>> >> >>
>> >> >> - Second time I started it with -v and got:
>> >> >>
>> >> >> mdadm: /dev/md101 is identified as a member of /dev/md1, slot 5.
>> >> >> mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 4.
>> >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1.
>> >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2.
>> >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 0.
>> >> >> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 3.
>> >> >> mdadm: added /dev/sdi1 to /dev/md1 as 1
>> >> >> mdadm: added /dev/sdh1 to /dev/md1 as 2
>> >> >> mdadm: added /dev/sde1 to /dev/md1 as 3
>> >> >> mdadm: added /dev/sdk1 to /dev/md1 as 4
>> >> >> mdadm: added /dev/md101 to /dev/md1 as 5 (possibly out of date)
>> >> >> mdadm: added /dev/sdg1 to /dev/md1 as 0
>> >> >> mdadm: /dev/md1 assembled from 5 drives (out of 6), but not started.
>> >> >>
>> >> >> - On third time I tried without --nodegraded with mdadm -A -v -u UUID
>> >> >> /dev/md1. This is what I've got:
>> >> >>
>> >> >> mdadm: /dev/md101 is identified as a member of /dev/md1, slot 5.
>> >> >> mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 4.
>> >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1.
>> >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2.
>> >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 0.
>> >> >> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 3.
>> >> >> mdadm: added /dev/sdi1 to /dev/md1 as 1
>> >> >> mdadm: added /dev/sdh1 to /dev/md1 as 2
>> >> >> mdadm: added /dev/sde1 to /dev/md1 as 3
>> >> >> mdadm: added /dev/sdk1 to /dev/md1 as 4
>> >> >> mdadm: added /dev/md101 to /dev/md1 as 5 (possibly out of date)
>> >> >> mdadm: added /dev/sdg1 to /dev/md1 as 0
>> >> >> mdadm: /dev/md1 assembled from 5 drives - not enough to start the
>> >> >> array while not clean - consider --force.
>> >> >>
>> >> >> Array md1 has bitmap. All drive devices have all same Events, their
>> >> >> state is clean and Device Role is Active device. md101 has active
>> >> >> state and lower Events.
>> >> >>
>> >> >> Is this expected behavior? My theory is that it is caused by md101 and
>> >> >> I should start array md1 without it (by for example stopping md101)
>> >> >> and then re-add it. Is that a case or is it something else?
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >> Best regards,
>> >> >>
>> >> >> Patrik
>> >> >
>> >> >
>> >> > The array is clearly degraded as one of the devices failed and hasn't been
>> >> > recovered yet, so using --nodegraded is counter productive, as you
>> >> > discovered.
>> >> >
>> >> > It appears that the array is also marked as 'dirty'.  That suggests that it
>> >> > wasn't shut down cleanly.
>> >> > What does "mdadm --examine" of some device show?
>> >> >
>> >> > You probably need to re-assemble the array with --force like it suggests,
>> >> > then add the failed device and let it recover.
>> >> >
>> >> > NeilBrown
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html