The saga continues... By stracing mdadm -E I determined the sds1 superblock is at 300066340864 which was the first location tried. Similarly for sdq1, the first location tried is 300089737216 So I read the s1 superblock: dd if=/dev/sds1 of=sb skip=300066340864 bs=1 count=4096 write the q1 superblock: dd if=sb of=/dev/sdq1 seek=300089737216 bs=1 count=4096 and now mdadm -E thinks q1 has a superblock, though some of the data is incorrect, most importantly the superblock identifies the slot 1 device and I want it to be slot 0. I changed the byte at offset 3981 from 1 to zero and the RaidDevice changed from 1 to 0 ditto for byte 3969 which changed the Number from 1 to 0 Then I changed the checksum to the expected value. (I used vim and the xxd program to edit the binary file) Now mdadm -E shows: Number Major Minor RaidDevice State this 0 65 33 0 active sync /dev/sds1 The device /dev/sds1 is still wrong (this is sdq1) but I thought I would try assembling since the indices were both 0 which is what I wanted. root@athlon:~ # mdadm -A /dev/md1 -v /dev/sdq1 /dev/sds1 /dev/sdab1 missing /dev/sdaa3 /dev/sdo1 /dev/sdu1 missing mdadm: looking for devices for /dev/md1 mdadm: /dev/sdq1 is identified as a member of /dev/md1, slot 0. mdadm: /dev/sds1 is identified as a member of /dev/md1, slot 1. mdadm: /dev/sdab1 is identified as a member of /dev/md1, slot 2. mdadm: cannot open device missing: No such file or directory mdadm: missing has no superblock - assembly aborted Oops, missing is the wrong syntax. Apparently mdadm uses only the superblock and not the command line to determine the device slot. root@athlon:~ # mdadm -A /dev/md1 -v /dev/sdq1 /dev/sds1 /dev/sdab1 /dev/sdaa3 /dev/sdo1 /dev/sdu1 mdadm: looking for devices for /dev/md1 mdadm: /dev/sdq1 is identified as a member of /dev/md1, slot 0. mdadm: /dev/sds1 is identified as a member of /dev/md1, slot 1. mdadm: /dev/sdab1 is identified as a member of /dev/md1, slot 2. mdadm: /dev/sdaa3 is identified as a member of /dev/md1, slot 4. mdadm: /dev/sdo1 is identified as a member of /dev/md1, slot 5. mdadm: /dev/sdu1 is identified as a member of /dev/md1, slot 6. mdadm: added /dev/sds1 to /dev/md1 as 1 mdadm: added /dev/sdab1 to /dev/md1 as 2 mdadm: no uptodate device for slot 3 of /dev/md1 mdadm: added /dev/sdaa3 to /dev/md1 as 4 mdadm: added /dev/sdo1 to /dev/md1 as 5 mdadm: added /dev/sdu1 to /dev/md1 as 6 mdadm: no uptodate device for slot 7 of /dev/md1 mdadm: added /dev/sdq1 to /dev/md1 as 0 mdadm: /dev/md1 has been started with 6 drives (out of 8). It worked! The superblock for sdq1 still looks funny. root@athlon:~ # mdadm -E /dev/sdq1 .. Number Major Minor RaidDevice State this 0 65 33 0 active sync /dev/sds1 <<<<<<<<<<<<<<<<<<< s.b. sdq1, minor 1 0 0 65 1 0 active sync /dev/sdq1 1 1 65 33 1 active sync /dev/sds1 ... So I changed the byte at 0xf8a from 0x21 (33 decimal) to 01 and fixed the checksum and now it looks like: Number Major Minor RaidDevice State this 0 65 1 0 active sync /dev/sdq1 0 0 65 1 0 active sync /dev/sdq1 OK! Now I can fsck -n and see how bad things are. A feature request would be for a way to force mdadm to use a device in a certain slot regardless of what the superblock says. On 2005-12-02 14:07:04, Andrew Burgess aab@xxxxxxxxxxx said: > I tried: > > root # mdadm -A /dev/md1 -v --force /dev/sdq1 /dev/sds1 /dev/sdab1 missing /dev/sdaa3 /dev/sdo1 /dev/sdu1 missing > mdadm: looking for devices for /dev/md1 > mdadm: no recogniseable superblock > mdadm: /dev/sdq1 has no superblock - assembly aborted > > and: > > root # mdadm -A /dev/md1 -v --update=summaries --force /dev/sdq1 /dev/sds1 /dev/sdab1 missing /dev/sdaa3 /dev/sdo1 /dev/sdu1 missing > mdadm: looking for devices for /dev/md1 > mdadm: no recogniseable superblock > mdadm: /dev/sdq1 has no superblock - assembly aborted > > My next idea is to use dd to copy the superblock from a working device to sdq1 > and edit it for the correct index. > > Any thoughts? > > > On 2005-12-01 0:05:54, Andrew Burgess aab@xxxxxxxxxxx said: > >> I have an 8 device raid6 array with 3 bad devices. Two >> of the bad devices are recognized as spares belonging to >> the array, the third device, the one that was most recently >> an active sync part of the array somehow losts its superblock. >> >> I'd like to try running the array with each of the bad devices >> and see which makes an array with the least damaged filesystem. >> One problem is how to add the device without the superblock. >> I want to make sure it goes into position[0] in the array and >> I'm not sure how to specifiy that with mdadm. >> >> sdq1 is the device without the superblock, sdn1 and sde1 are >> marked as spares but they were in sync recently. >> >> To add sdq1 as device [0] even though it has no superblock would it be enough >> to specify all the devices in the right order and leave the two that I'm not >> experimenting with as missing? >> >> mdadm -A /dev/md1 --force /dev/sdq1 /dev/sds1 /dev/sdab1 missing /dev/sdaa3 /dev/sdo1 /dev/sdu1 missing >> >> And to try each spare in positions [3] and [7] a similar command, even though >> the superblocks on the spares say [8] and [9]? >> >> I want to avoid md doing any resyncing or recovery until I find the best >> 'bad' device to use. >> >> Thanks for any help! >> Andrew >> >> PS This all happened when I upgraded the motherboard and the kernel version >> at the same time, the resulting combination worked badly with my disk controllers >> causing md to think drives were bad when they really weren't. Though how the >> superblock vanished on the one drive is a mystery... >> >> ======================================================= >> >> root # cat /proc/mdstat >> md1 : inactive sds1[1] sde1[9] sdn1[8] sdu1[6] sdo1[5] sdaa3[4] sdab1[2] >> 2051009792 blocks >> >> root # mdadm -A /dev/md1 >> mdadm: /dev/md1 assembled from 5 drives and 2 spares - not enough to start the array. >> >> root # mdadm -A -v /dev/md1 2>&1 | grep added >> mdadm: added /dev/sdab1 to /dev/md1 as 2 >> mdadm: added /dev/sdaa3 to /dev/md1 as 4 >> mdadm: added /dev/sdo1 to /dev/md1 as 5 >> mdadm: added /dev/sdu1 to /dev/md1 as 6 >> mdadm: added /dev/sdn1 to /dev/md1 as 8 >> mdadm: added /dev/sde1 to /dev/md1 as 9 >> mdadm: added /dev/sds1 to /dev/md1 as 1 >> >> root # mdadm -E /dev/sde1 >> /dev/sde1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 7fdb1d16:24896504:7df4ea3b:c7f0bf96 >> Creation Time : Sat Nov 12 12:43:57 2005 >> Raid Level : raid6 >> Device Size : 292969216 (279.40 GiB 300.00 GB) >> Raid Devices : 8 >> Total Devices : 8 >> Preferred Minor : 1 >> >> Update Time : Wed Nov 30 08:12:57 2005 >> State : clean >> Active Devices : 6 >> Working Devices : 8 >> Failed Devices : 2 >> Spare Devices : 2 >> Checksum : 2c0e61a6 - correct >> Events : 0.930007 >> >> >> Number Major Minor RaidDevice State >> this 9 8 65 9 spare /dev/sde1 >> >> 0 0 65 1 0 active sync /dev/sdq1 >> 1 1 65 33 1 active sync /dev/sds1 >> 2 2 65 177 2 active sync /dev/sdab1 >> 3 3 0 0 3 faulty removed >> 4 4 65 163 4 active sync /dev/sdaa3 >> 5 5 8 225 5 active sync /dev/sdo1 >> 6 6 65 65 6 active sync /dev/sdu1 >> 7 7 0 0 7 faulty removed >> 8 8 8 209 8 spare /dev/sdn1 >> 9 9 8 65 9 spare /dev/sde1 >> >> root # mdadm -E /dev/sdn1 >> /dev/sdn1: >> Magic : a92b4efc >> Version : 00.90.00 >> UUID : 7fdb1d16:24896504:7df4ea3b:c7f0bf96 >> Creation Time : Sat Nov 12 12:43:57 2005 >> Raid Level : raid6 >> Device Size : 292969216 (279.40 GiB 300.00 GB) >> Raid Devices : 8 >> Total Devices : 8 >> Preferred Minor : 1 >> >> Update Time : Wed Nov 30 08:12:57 2005 >> State : clean >> Active Devices : 6 >> Working Devices : 8 >> Failed Devices : 2 >> Spare Devices : 2 >> Checksum : 2c0e6234 - correct >> Events : 0.930007 >> >> >> Number Major Minor RaidDevice State >> this 8 8 209 8 spare /dev/sdn1 >> >> 0 0 65 1 0 active sync /dev/sdq1 >> 1 1 65 33 1 active sync /dev/sds1 >> 2 2 65 177 2 active sync /dev/sdab1 >> 3 3 0 0 3 faulty removed >> 4 4 65 163 4 active sync /dev/sdaa3 >> 5 5 8 225 5 active sync /dev/sdo1 >> 6 6 65 65 6 active sync /dev/sdu1 >> 7 7 0 0 7 faulty removed >> 8 8 8 209 8 spare /dev/sdn1 >> 9 9 8 65 9 spare /dev/sde1 >> >> - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html