On Mon, 22 Sep 2014 09:20:35 +0200 Patrik Horník <patrik@xxxxxx> wrote: > Well I browsed through the sources of latest mdadm version at night > instead of sleeping :) and was searching how it got clean flag set to > 0. And was not sure exactly about that line there and from which > device it gets clean state. So the bug was that it can get it from not > current device? It makes sense because first device identified by > mdadm is old md101 device. Correct. > > So will it work then if I use 3.3 and somehow dont give it md101 > device? By stopping it before -A call or by manually specifying other > drives? Or you really recommend to build latest version of mdadm? mdadm -S /dev/mdXX list-of-devices-that-are-working should start the array for you using 3.3. Then mdadm /dev/mdXX --re-add /dev/md101 will re-add the device and hopefully do a quick bitmap-based rebuild. > > What is expected behaviour with 3.3.1+? Can it be started with all > devices and it should automatically start to recover md101? If so what > is best way, to start it without md101 and then use -re-add to add it, > start it without md101 and use -add or start it also with md101? --add should have the same effect as --re-add. I think mdadm 3.3.1 will just assemble the array without md101 and you then have to --re-add that yourself. To get it re-added automatically you needed to have "policy action=re-add" or similar in mdadm.conf, and then use mdadm -I /dev/devicename on each device. That should put the whole array together and re-add anything that needs it. I think. NeilBrown > > Thank you very much. > > 2014-09-22 8:56 GMT+02:00 NeilBrown <neilb@xxxxxxx>: > > On Mon, 22 Sep 2014 08:34:21 +0200 Patrik Horník <patrik@xxxxxx> wrote: > > > >> - Well what is exact meaning of --no-degraded then? Because I am using > >> it also on RAID6 arrays that are missing one drive and mdadm starts > >> them. I thought until today that it is against assembling for example > >> RAID6 array with missing more than two drives or to be more precise > >> array with number of drives it used last time. (I did not look at the > >> code what does it exactly. It is mdadm 3.3 on Debian.) > > > > Sorry, I confused myself. > > "--no-degraded" means "Only start the array if all expected devices are > > present". > > So if the array "knows" that one device is missing, it will start if all > > other devices are present. But if it "thinks" that all devices are working, > > then it will only start if all the devices ar there. > > > >> > >> - Well array was shutdown cleanly manually by mdadm -S. Cant the not > >> clean classification be result of md101 device between find devices or > >> result of first two assemble tries? > > > > If the state still says "Clean" (which it does, thanks), the mdadm should > > treat it as 'clean'. > > > > I think you are probably hitting the bug fixed by > > > > http://git.neil.brown.name/?p=mdadm.git;a=commitdiff;h=56bbc588f7f0f3bdd3ec23f02109b427c1d3b8f1 > > > > which is in 3.3.1. > > > > So a new version of mdadm should fix it. > > > > NeilBrown > > > > > > > >> > >> - Anyway as I mentioned superblock on all five devices has clean state. Example: > >> /dev/sdk1: > >> Magic : XXXXXXX > >> Version : 1.2 > >> Feature Map : 0x1 > >> Array UUID : XXXXXXXXXXXXXXXXXXXXX > >> Name : > >> Creation Time : Thu Aug XXXXXXXX > >> Raid Level : raid6 > >> Raid Devices : 6 > >> > >> Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB) > >> Array Size : 11720536064 (11177.57 GiB 12001.83 GB) > >> Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB) > >> Data Offset : 262144 sectors > >> Super Offset : 8 sectors > >> Unused Space : before=262056 sectors, after=911 sectors > >> State : clean > >> Device UUID : YYYYYYYYYYYYYYYYYYYY > >> > >> Internal Bitmap : 8 sectors from superblock > >> Update Time : Mon Sep 22 02:23:45 2014 > >> Bad Block Log : 512 entries available at offset 72 sectors > >> Checksum : ZZZZZZZZ - correct > >> Events : EEEEEE > >> > >> Layout : left-symmetric > >> Chunk Size : 512K > >> > >> Device Role : Active device 4 > >> Array State : AAAAA. ('A' == active, '.' == missing, 'R' == replacing) > >> > >> - md101 has Events count lower by 16 than others devices. > >> > >> - Please I need little more assurance what is exact state of array and > >> explain why it is behaving as it is behaving, so I can be sure what > >> steps are needed and what happens. The data on array is important. > >> Patrik Horník > >> šéfredaktor www.DSL.sk > >> Tel.: +421 905 385 666 > >> Email: patrik@xxxxxx > >> > >> > >> 2014-09-22 5:19 GMT+02:00 NeilBrown <neilb@xxxxxxx>: > >> > On Mon, 22 Sep 2014 04:11:20 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >> > > >> >> Hello Neil, > >> >> > >> >> I've got this situation unfamiliar to me on RAID6 array md1 with important data. > >> >> > >> >> - It is RAID6 with 6 devices, 5 are partitions and 1 is another RAID0 > >> >> array md101 from two smaller drives. One of the smaller drives froze, > >> >> so md101 got kicked out from md1 and marked as faulty in md1. After > >> >> while I've stopped md1 without removing md101 from it first. Then I > >> >> rebooted and assembled md101. > >> >> > >> >> - First I tried mdadm -A --no-degraded -u UUID /dev/md1 but got > >> >> "mdadm: /dev/md1 assembled from 5 drives (out of 6), but not started." > >> >> so I stopped the md1. > >> >> > >> >> - Second time I started it with -v and got: > >> >> > >> >> mdadm: /dev/md101 is identified as a member of /dev/md1, slot 5. > >> >> mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 4. > >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1. > >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2. > >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 0. > >> >> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 3. > >> >> mdadm: added /dev/sdi1 to /dev/md1 as 1 > >> >> mdadm: added /dev/sdh1 to /dev/md1 as 2 > >> >> mdadm: added /dev/sde1 to /dev/md1 as 3 > >> >> mdadm: added /dev/sdk1 to /dev/md1 as 4 > >> >> mdadm: added /dev/md101 to /dev/md1 as 5 (possibly out of date) > >> >> mdadm: added /dev/sdg1 to /dev/md1 as 0 > >> >> mdadm: /dev/md1 assembled from 5 drives (out of 6), but not started. > >> >> > >> >> - On third time I tried without --nodegraded with mdadm -A -v -u UUID > >> >> /dev/md1. This is what I've got: > >> >> > >> >> mdadm: /dev/md101 is identified as a member of /dev/md1, slot 5. > >> >> mdadm: /dev/sdk1 is identified as a member of /dev/md1, slot 4. > >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md1, slot 1. > >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md1, slot 2. > >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md1, slot 0. > >> >> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot 3. > >> >> mdadm: added /dev/sdi1 to /dev/md1 as 1 > >> >> mdadm: added /dev/sdh1 to /dev/md1 as 2 > >> >> mdadm: added /dev/sde1 to /dev/md1 as 3 > >> >> mdadm: added /dev/sdk1 to /dev/md1 as 4 > >> >> mdadm: added /dev/md101 to /dev/md1 as 5 (possibly out of date) > >> >> mdadm: added /dev/sdg1 to /dev/md1 as 0 > >> >> mdadm: /dev/md1 assembled from 5 drives - not enough to start the > >> >> array while not clean - consider --force. > >> >> > >> >> Array md1 has bitmap. All drive devices have all same Events, their > >> >> state is clean and Device Role is Active device. md101 has active > >> >> state and lower Events. > >> >> > >> >> Is this expected behavior? My theory is that it is caused by md101 and > >> >> I should start array md1 without it (by for example stopping md101) > >> >> and then re-add it. Is that a case or is it something else? > >> >> > >> >> Thanks. > >> >> > >> >> Best regards, > >> >> > >> >> Patrik > >> > > >> > > >> > The array is clearly degraded as one of the devices failed and hasn't been > >> > recovered yet, so using --nodegraded is counter productive, as you > >> > discovered. > >> > > >> > It appears that the array is also marked as 'dirty'. That suggests that it > >> > wasn't shut down cleanly. > >> > What does "mdadm --examine" of some device show? > >> > > >> > You probably need to re-assemble the array with --force like it suggests, > >> > then add the failed device and let it recover. > >> > > >> > NeilBrown > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >
Attachment:
signature.asc
Description: PGP signature