On 20 February 2011 14:44, Claude Nobs <claudenobs@xxxxxxxxx> wrote: > On Sun, Feb 20, 2011 at 06:25, NeilBrown <neilb@xxxxxxx> wrote: >> On Sun, 20 Feb 2011 04:23:09 +0100 Claude Nobs <claudenobs@xxxxxxxxx> wrote: >> >>> Hi All, >>> >>> I was wondering if someone might be willing to share if this array is >>> recoverable. >>> >> >> Probably is. ÂBut don't do anything yet - any further action until you have >> read all of the following email, will probably cause more harm than good. >> >>> I had a clean, running raid5 using 4 block devices (two of those were >>> 2 disk raid0 md devices) in RAID 5. Last night I decided it was safe >>> to grow the array by one disk. But then a) a disk failed, b) a power >>> loss occured, c) i probably switched the wrong disk and forced >>> assembly, resulting in an inconsistent state. Here is a complete set >>> of actions taken : >> >> Providing this level of information is excellent! >> >> >>> >>> > bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2 >>> > mdadm: Need to backup 768K of critical section.. >>> > mdadm: ... critical section passed. >>> > bernstein@server:~$ cat /proc/mdstat >>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >>> > md1 : active raid0 sdg1[1] sdf1[0] >>> > ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks >>> > >>> > md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0] >>> > ÂÂÂÂÂ 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] >>> > ÂÂÂÂÂ [>....................]Â reshape =Â 1.6% (16423164/976760640) finish=902.2min speed=17739K/sec >>> > >>> > md0 : active raid0 sdh1[0] sdb1[1] >>> > ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks >>> > >>> > unused devices: <none> >> >> All looks good so-far. >> >>> >>> >>> now i thought /dev/sdg1 failed. unfortunately i have no log for this >>> one, just my memory of seeing this changed to the one above : >>> >>> > ÂÂÂÂÂ 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU] >>> >> >> Unfortunately it is not possible to know which drive is missing from the >> above info. ÂThe [numbers] is brackets don't exactly corresponds to the >> positions in the array that you might thing they do. ÂThe mdstat listing above >> has numbers 0,1,3,4,5. >> >> They are the 'Number' column in the --detail output below. ÂThis is /dev/md1 >> - I can tell from the --examine outputs, but it is a bit confusing. ÂNewer >> versions of mdadm make this a little less confusing. ÂIf you look for >> patterns of U and u Âin the 'Array State' line, the U is 'this device', the >> 'u' is some other devices. > > Actually this is running a stock Ubunutu 10.10 server kernel. But as > it is from my memory it could very well have been : > > Â Â Â 2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU] > >> >> So /dev/md1 had a failure, so it could well have been sdg1. >> >> >>> some 10 minutes later a power loss occurred, thanks to an ups the >>> server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1, >>> rebooted and in a lapse of judgement forced assembly: >> >> Perfect timing :-) >> >>> >>> > bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1 >>> > mdadm: Could not open /dev/sda1 for write - cannot Assemble array. >>> > mdadm: Failed to restore critical section for reshape, sorry. >> >> This isn't actually a 'forced assembly' as you seem to think. ÂThere is no >> '-f' or '--force'. ÂIt didn't cause any harm. > > phew... at last some luck! that "Failed to restore critical section > for reshape, sorry" really scared the hell out of me. > But then again it got me paying attention and stop making things worse... :-) > >> >>> > >>> > bernstein@server:~$ sudo mdadm --detail /dev/md2 >>> > /dev/md2: >>> > ÂÂÂÂÂÂÂ Version : 01.02 >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > Â Used Dev Size : 976760640 (931.51 GiB 1000.20 GB) >>> > ÂÂ Raid Devices : 5 >>> > Â Total Devices : 3 >>> > Preferred Minor : 3 >>> > ÂÂÂ Persistence : Superblock is persistent >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011 >>> > ÂÂÂÂÂÂÂÂÂ State : active, degraded, Not Started >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â^^^^^^^^^^^^ >> >> mdadm has put the devices together as best it can, but has not started the >> array because it didn't have enough devices. ÂThis is good. >> >> >>> > ÂActive Devices : 3 >>> > Working Devices : 3 >>> > ÂFailed Devices : 0 >>> > Â Spare Devices : 0 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > Â Delta Devices : 1, (4->5) >>> > >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > ÂÂÂÂÂÂÂÂÂÂ UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂ Events : 133609 >>> > >>> > ÂÂÂ NumberÂÂ MajorÂÂ MinorÂÂ RaidDevice State >>> > ÂÂÂÂÂÂ 0ÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 33ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ active syncÂÂ /dev/sdc1 >>> > ÂÂÂÂÂÂ 1ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂ removed >>> > ÂÂÂÂÂÂ 2ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 2ÂÂÂÂÂ removed >>> > ÂÂÂÂÂÂ 4ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 3ÂÂÂÂÂ active syncÂÂ /dev/block/9:0 >>> > ÂÂÂÂÂÂ 5ÂÂÂÂÂÂ 8ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 4ÂÂÂÂÂ active syncÂÂ /dev/sda1 >> >> Some you now have 2 devices missing. ÂAlong as we can find the devices, >> Âmdadm --assemble --force >> should be able to put them togethe for you. ÂBut let's see Âwhat we have... >> >>> >>> so i reattached the old disk, got /dev/md1 back and did the >>> investigation i should have done before : >>> >>> > bernstein@server:~$ sudo mdadm --examine /dev/sdd1 >>> > /dev/sdd1: >>> > ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc >>> > ÂÂÂÂÂÂÂ Version : 1.2 >>> > ÂÂÂ Feature Map : 0x4 >>> > ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > ÂÂ Raid Devices : 5 >>> > >>> > ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB) >>> > ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB) >>> > Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB) >>> > ÂÂÂ Data Offset : 272 sectors >>> > ÂÂ Super Offset : 8 sectors >>> > ÂÂÂÂÂÂÂÂÂ State : clean >>> > ÂÂÂ Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6 >>> > >>> > Â Reshape pos'n : 489510400 (466.83 GiB 501.26 GB) >>> > Â Delta Devices : 1 (4->5) >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:23:09 2011 >>> > ÂÂÂÂÂÂ Checksum : fd0c1794 - correct >>> > ÂÂÂÂÂÂÂÂ Events : 133567 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > ÂÂÂ Array Slot : 1 (0, 1, failed, 2, 3, 4) >>> > ÂÂ Array State : uUuuu 1 failed >> >> This device thinks all is well. ÂThe "1 failed" is misleading. ÂThe >> Â uUuuu >> patterns says that all the devices are though to be working. >> Note for later reference: >> Â Â Â Â Events: 133567 >> ÂReshape pos'n : 489510400 >> >> >>> > bernstein@server:~$ sudo mdadm --examine /dev/sda1 >>> > /dev/sda1: >>> > ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc >>> > ÂÂÂÂÂÂÂ Version : 1.2 >>> > ÂÂÂ Feature Map : 0x4 >>> > ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > ÂÂ Raid Devices : 5 >>> > >>> > ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB) >>> > ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB) >>> > Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB) >>> > ÂÂÂ Data Offset : 272 sectors >>> > ÂÂ Super Offset : 8 sectors >>> > ÂÂÂÂÂÂÂÂÂ State : clean >>> > ÂÂÂ Device UUID : baebd175:e4128e4c:f768b60f:4df18f77 >>> > >>> > Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB) >>> > Â Delta Devices : 1 (4->5) >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011 >>> > ÂÂÂÂÂÂ Checksum : 12c832c6 - correct >>> > ÂÂÂÂÂÂÂÂ Events : 133609 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > ÂÂÂ Array Slot : 5 (0, failed, failed, failed, 3, 4) >>> > ÂÂ Array State : u__uU 3 failed >> >> This device thinks devices 1 and 2 have failed (the '_'s). >> So 'sdd1' above, and and md1. >> Â Â Â ÂEvents : 133609 - this has advanced a bit from sdd1 >> ÂReshape Pos'n : 502815488 - this has advanced quite a lot. >> >> >>> > bernstein@server:~$ sudo mdadm --examine /dev/sdc1 >>> > /dev/sdc1: >>> > ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc >>> > ÂÂÂÂÂÂÂ Version : 1.2 >>> > ÂÂÂ Feature Map : 0x4 >>> > ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > ÂÂ Raid Devices : 5 >>> > >>> > ÂAvail Dev Size : 1953521392 (931.51 GiB 1000.20 GB) >>> > ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB) >>> > Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB) >>> > ÂÂÂ Data Offset : 272 sectors >>> > ÂÂ Super Offset : 8 sectors >>> > ÂÂÂÂÂÂÂÂÂ State : clean >>> > ÂÂÂ Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94 >>> > >>> > Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB) >>> > Â Delta Devices : 1 (4->5) >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011 >>> > ÂÂÂÂÂÂ Checksum : 8aa7d094 - correct >>> > ÂÂÂÂÂÂÂÂ Events : 133609 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > ÂÂÂ Array Slot : 0 (0, failed, failed, failed, 3, 4) >>> > ÂÂ Array State : U__uu 3 failed >> >> ÂReshape pos'n, Events, and Array State are identical to sda1. >> So these two are in agreement. >> >> >>> > bernstein@server:~$ sudo mdadm --examine /dev/md0 >>> > /dev/md0: >>> > ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc >>> > ÂÂÂÂÂÂÂ Version : 1.2 >>> > ÂÂÂ Feature Map : 0x4 >>> > ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > ÂÂ Raid Devices : 5 >>> > >>> > ÂAvail Dev Size : 1953541616 (931.52 GiB 1000.21 GB) >>> > ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB) >>> > Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB) >>> > ÂÂÂ Data Offset : 272 sectors >>> > ÂÂ Super Offset : 8 sectors >>> > ÂÂÂÂÂÂÂÂÂ State : clean >>> > ÂÂÂ Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893 >>> > >>> > Â Reshape pos'n : 502815488 (479.52 GiB 514.88 GB) >>> > Â Delta Devices : 1 (4->5) >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:32:04 2011 >>> > ÂÂÂÂÂÂ Checksum : 1bbf913b - correct >>> > ÂÂÂÂÂÂÂÂ Events : 133609 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > ÂÂÂ Array Slot : 4 (0, failed, failed, failed, 3, 4) >>> > ÂÂ Array State : u__Uu 3 failed >> >> again, exactly the same as sda1 and sdc1. >> >>> > bernstein@server:~$ sudo mdadm --examine /dev/md1 >>> > /dev/md1: >>> > ÂÂÂÂÂÂÂÂÂ Magic : a92b4efc >>> > ÂÂÂÂÂÂÂ Version : 1.2 >>> > ÂÂÂ Feature Map : 0x4 >>> > ÂÂÂÂ Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523 >>> > ÂÂÂÂÂÂÂÂÂÂ Name : master:public >>> > Â Creation Time : Sat Jan 22 00:15:43 2011 >>> > ÂÂÂÂ Raid Level : raid5 >>> > ÂÂ Raid Devices : 5 >>> > >>> > ÂAvail Dev Size : 1953541616 (931.52 GiB 1000.21 GB) >>> > ÂÂÂÂ Array Size : 7814085120 (3726.05 GiB 4000.81 GB) >>> > Â Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB) >>> > ÂÂÂ Data Offset : 272 sectors >>> > ÂÂ Super Offset : 8 sectors >>> > ÂÂÂÂÂÂÂÂÂ State : clean >>> > ÂÂÂ Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed >>> > >>> > Â Reshape pos'n : 502809856 (479.52 GiB 514.88 GB) >>> > Â Delta Devices : 1 (4->5) >>> > >>> > ÂÂÂ Update Time : Sat Feb 19 22:30:29 2011 >>> > ÂÂÂÂÂÂ Checksum : 6c591e90 - correct >>> > ÂÂÂÂÂÂÂÂ Events : 133603 >>> > >>> > ÂÂÂÂÂÂÂÂ Layout : left-symmetric >>> > ÂÂÂÂ Chunk Size : 64K >>> > >>> > ÂÂÂ Array Slot : 3 (0, failed, failed, 2, 3, 4) >>> > ÂÂ Array State : u_Uuu 2 failed >> >> And here is md1. ÂThinks device 2 - sdd1 - has failed. >> Â Â Â ÂEvents : 133603 - slightly behind the 3 good devices, be well after >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âsdd1 >> ÂReshape Pos'n : 502809856 - just a little before the 3 good devices too. >> >>> >>> so obviously not /dev/sdd1 failed. however (due to that silly forced >>> assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a >>> few bytes, resulting in an inconsistent state... >> >> The way I read it is: >> >> Âsdd1 failed first - shortly after Sat Feb 19 22:23:09 2011 - the updateÂtime on sdd1 >> reshape continued until some time between Sat Feb 19 22:30:29 2011 >> and Sat Feb 19 22:32:04 2011 when md1 had a failure. >> The reshape couldn't continue now, so it stopped. >> >> So the data on sdd1 is only (there has been about 8 minutes of reshape since >> then) and cannot be used. >> The data on md1 is very close to the rest. ÂThe data that was in the process >> of being relocated lives in two locations on the 'good' drives, both the new >> and the old. ÂIt only lives in the 'old' location on md1. >> >> So what we need to do is re-assemble the array, but telling it that the >> reshape has only gone as far as md1 thinks it has. ÂThis will make sure it >> repeats that last part of the reshape. >> >> mdadm -Af should do that BUT IT DOESN'T. ÂAssuming I have thought through >> this properly (and I should go through it again with more care), mdadm won't >> do the right thing for you. ÂI need to get it to handle 'reshape' specially >> when doing a --force assemble. > > exactly what i was thinking of doing, glad i waited and asked. > >> >>> >>> > bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1 >>> > >>> > mdadm: /dev/md2 assembled from 3 drives - not enough to start the array. >>> > bernstein@server:~$ cat /proc/mdstat >>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >>> > md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S) >>> > ÂÂÂÂÂ 4883823704 blocks super 1.2 >>> > >>> > md1 : active raid0 sdf1[0] sdg1[1] >>> > ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks >>> > >>> > md0 : active raid0 sdb1[1] sdh1[0] >>> > ÂÂÂÂÂ 976770944 blocks super 1.2 64k chunks >>> > >>> > unused devices: <none> >>> >>> i do have a backup but since recovery from it takes a few days, i'd >>> like to know if there is a way to recover the array or if it's >>> completely lost. >>> >>> Any suggestions gratefully received, >> >> The fact that you have a backup is excellent. ÂYou might need it, but I hope >> not. >> >> I would like to provide you with a modified version of mdadm which you can >> then user to --force assemble the array. ÂIt should be able to get you access >> to all your data. >> The array will be degraded and will finish reshape in that state. ÂThen you >> will need to add sdd1 back in (Assuming you are confident that it works) and >> it will be rebuilt. >> >> Just to go through some of the numbers... >> >> Chunk size is 64K. ÂReshape was 4->5, so 3 -> 4 data disks. >> So old stripes have 192K, new stripes have 256K. >> >> The 'good' disks think reshape has reached 502815488K which is >> 1964123 new stripes. (2618830.66 old stripes) >> md1 thinks reshape has only reached 489510400K which is 1912150 >> new stripes (2549533.33 old stripes). > > i think you mixed up sdd1 with md1 here? (the numbers above for md1 > are for sdd1. md1 would be : Âreshape has reached 502809856K which > would be 1964101 new stripes. so the difference between the good disks > and md1 would be 22 stripes.) > >> >> So of the 51973 stripes that have been reshaped since the last metadata >> update on sdd1, some will have been done on sdd1, but some not, and we don't >> really know how many. ÂBut it is perfectly safe to repeat those stripes >> as all writes to that region will have been suspended (and you probably >> weren't writing anyway). > > jep there was nothing writing to the array. so now i am a little > confused, if you meant sdd1 (which failed first is 51973 stripes > behind) this would imply that at least so many stripes of data are > kept of the old (3 data disks) configuration as well as the new one? > if continuing from there is possible then the array would no longer be > degraded right? so i think you meant md1 (22 stripes behind), as > keeping 5.5M of data from the old and new config seems more > reasonable. however this is just a guess :-) > >> >> So I need to change the loop in Assemble.c which calls ->update_super >> with "force-one" to also make sure the reshape_position in the 'chosen' >> superblock match the oldest 'forced' superblock. > > uh... ah... probably, i have zero knowledge of kernel code :-) > i guess it should take into account that the oldest superblock (sdd1 > in this case) may already be out of the section were the data (in the > old config) still exists? but i guess you already thought of that... > >> >> So if you are able to wait a day, I'll try to write a patch first thing >> tomorrow and send it to you. > > sure, that would be awesome! that boils down to compiling the patched > kernel doesn't it? this will probably take a few days as the system is > quite slow and i'd have to get up to speed with kernel compiling. but > shouldn't be a problem. would i have to patch the ubuntu kernel (based > on 2.6.35.4) or the latest 2.6.38-rc from kernel.org? > >> >> Thanks for the excellent problem report. >> >> NeilBrown > > Well i thank you for providing such an elaborate and friendly answer! > this is actually my first mailing list post and considering how many > questions get ignored (don't know about this list though) i just hoped > someone would at least answer with a one liner... i never expected > this. so thanks again. > > Claude > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html > Just a quick FYI, you can find (new, and unreleased) Ubuntu kernels here: http://kernel.ubuntu.com/~kernel-ppa/mainline/ // Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html