On 10/1/2012 10:15 PM, NeilBrown wrote:
On Sun, 30 Sep 2012 19:23:16 -0400 EJ Vincent <ej@xxxxxxxxx> wrote:
On 9/30/2012 4:28 PM, Phil Turmel wrote:
On 09/30/2012 03:25 PM, EJ Vincent wrote:
On 9/30/2012 3:22 PM, Mathias Burén wrote:
Can't you just boot off an older Ubuntu USB, install mdadm and scan /
assemble, see the device order?
Hi Mathias,
I'm under the impression that damage to the metadata has already been
done by 12.04, making a recovery from an older version of Ubuntu
(10.04), impossible. Is this line of thinking, flawed?
Your impression is correct. Permanent damage to the metadata was done.
You *must* re-create your array.
However, you *cannot* use your new version of mdadm, as it will get the
data offset wrong. Your first report showed a data offset of 272.
Newer versions of mdadm default to 2048. You *must* perform all of your
"mdadm --create --assume-clean" permutations with 10.04.
Do you have *any* dmesg output from the old system? Or dmesg from the
very first boot under 12.04? That might have enough information to
shorten your search.
In the future, you should record your setup by saving the output of
"mdadm -D" on each array, "mdadm -E" on each member device, and the
output of "ls -l /dev/disk/by-id/"
Or try my documentation script "lsdrv". [1]
HTH,
Phil
[1] http://github.com/pturmel/lsdrv
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Phil,
Unfortunately I don't have any dmesg log from the old system or the
first boot under 12.04.
Getting my system to boot at all under 12.04 was chaotic enough, with
the overly-aggressive /usr/share/initramfs-tools/scripts/mdadm-functions
ravaging my array and then dropping me to a busybox shell over and over
again. I didn't think to record the very first error.
Here's an observation of mine, disks: /dev/sdb1, /dev/sdi1, and
/dev/sdj1 don't have the Raid level "-unknown-", neither are they
labeled as spares. They are in fact, labeled clean and appear
*different* from the others.
Could these disks still contain my metadata from 10.04? I recall during
my installation of 12.04 I had anywhere from 1 to 3 disks unpowered, so
that I could drop in a SATA CD/DVDRW into the slot.
I am downloading 10.04.4 LTS and will be ready to use it soon. I fear
having to do permutations-- 9! (factorial) would mean 362,880
combinations. *gasp*
You might be able to avoid the 9! combinations, which could take a while ...
4 days if you could test one per second.
Try this:
for i in /dev/sd?1; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 \
skip=4256 | od -D | head -n1; done
This reads that 'dev_number' fields out of the metadata on each device.
This should not have been corrupted by the bug.
You might want some other pattern in place of "/dev/sd?1" - it needs to match
all the devices in your array.
Then on one of the devices which doesn't have corrupted metadata, run
dd 2> /dev/null if=/dev/sdXXX1 bs=2 count=$COUNT skip=2176 | od -d
where $COUNT is one more than the largest number that was reported in the
"dev_number" values reported above.
Now for each device, take the dev_number that was reported, use that as an
index into the list of numbers produced by the second command, and that
number if the role of the device in the array. i.e. it's position in the
list.
So after making an array of 5 'loop' devices in a non-obvious order, and
failing a device and re-adding it:
# for i in /dev/loop[01234]; do echo -n $i '' ; dd 2> /dev/null if=$i bs=1 count=4 skip=4256 | od -D | head -n1; done
/dev/loop0 0000000 3
/dev/loop1 0000000 4
/dev/loop2 0000000 1
/dev/loop3 0000000 0
/dev/loop4 0000000 5
and
# dd 2> /dev/null if=/dev/loop0 bs=2 count=6 skip=2176 | od -d
0000000 0 1 65534 3 4 2
0000014
So /dev/loop0 has dev_number '3'. Look for entry '3' in the list and get '3'
/dev/loop1 has 'dev_number' 4, so is device 4
/dev/loop4 has dev_number '5', so is device 2
etc
So we can reconstruct the order of devices:
/dev/loop3 /dev/loop2 /dev/loop4 /dev/loop0 /dev/loop1
Note the '65534' in the list means that there is no device with that
dev_number. i.e. no device is number '2', and looking at the list confirms
that.
You should be able to perform the same steps to recover the correct order to
try creating the array.
NeilBrown
Hi Neil,
Thank you so much for taking the time to help me through this.
Here's what I've come up with, per your instructions:
/dev/sda1 0000000 4
/dev/sdb1 0000000 11
/dev/sdc1 0000000 7
/dev/sde1 0000000 8
/dev/sdf1 0000000 1
/dev/sdg1 0000000 0
/dev/sdh1 0000000 6
/dev/sdi1 0000000 10
/dev/sdj1 0000000 9
dd 2> /dev/null if=/dev/sdc1 bs=2 count=12 skip=2176 | od -d
0000000 0 1 65534 65534 2 65534 4 5
0000020 6 7 8 3
0000030
Mind doing a sanity check for me?
Based on the above information, one such possible device order is:
/dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1* /dev/sda1 /dev/sdj1* /dev/sdh1
/dev/sdc1 /dev/sde1
where * represents the three unknown devices marked by 65534?
Once I have your blessing, would I then proceed to:
mdadm --create /dev/md0 --assume-clean --level=6 --raid-devices=9
--metadata=1.2 --chunk=512 /dev/sdg1 /dev/sdf1 /dev/sdb1* /dev/sdi1*
/dev/sda1 /dev/sdj1* /dev/sdh1 /dev/sdc1 /dev/sde1
and this is non-destructive, so I can attempt different orders?
Again, thank you for the help.
Best wishes,
-EJ
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html