I have a fairly complex LVM2/mdadm setup that I'm in the middle of
turning into a simpler setup. I made a mistake along the way, though,
and have landed in a confusing place.
This is kind of long, and I apologize for that -- trying to describe
completely how I got here. The complex setup I started with:
/dev/md5 is a RAID5 of /dev/sd{b,d,e,f}5
/dev/md6 is a RAID5 of /dev/sd{b,d,e,f}6
etc on up to /dev/md14
/dev/md99 is a RAID1 of /dev/sdg and /dev/sdh
/dev/md{5-14} plus /dev/md99 are all assembled into a volume group
(creatively called vglinux), which has three logical volumes. Only one,
lvstore, is relevant: the other two are getting destroyed as part of the
simplication.
The goal is to end with a RAID6 of /dev/sd{b,d,e,f,g,h}, and no
multiple-partition madness (it's there from the days of old, when mdadm
couldn't reshape arrays). The next step was to free up /dev/sdf,
starting with
pvmove /dev/md5
reshape md5 as a RAID5 of /dev/sd{b,d,e}5 (freeing /dev/sdf5)
lather, rinse, and repeat for the other mds.
The VG has plenty of free space for this; it's slow, but that's OK.
The problem: while md{5,6,7} went fine, I botched the pvmove for md8 and
ended up starting to reshape the array _before the pvmove happened._
Specifically, I did all of these:
mdadm --grow /dev/md8 --array-size 292730880 # it was 439489920
pvresize /dev/md8
mdadm --grow /dev/md8 --raid-devices 3 --backup-file ~/backup
_without_ having moved data off. Once I figured out what was going on,
I did
umount (all the filesystems in the VG)
vgchange -a n vglinux
mdadm --stop /dev/md8
which halted the reshape about 5% of the way done. Then (with some help
from NeilBrown and a buncha experiments with loopback devices) I used
the most recent mdadm snapshot to revert the reshape.
mdadm --assemble --update=revert-reshape /dev/md8 /dev/sd{b,d,e,f}8
NOTE WELL: I KNOW THAT THIS HAS DESTROYED SOME DATA. That's not the
question. [ :) ] There will be damage, yes, I know that, and I should
be able to detect that and correct it.
At this point /dev/md8 is back to 4 devices, array-size 439489920, and
can be started. Next step is to fsck lvstore to get a handle on the
damage before proceeding -- but vgchange -a y vglinux doesn't start lvstore:
# vgchange -a y vglinux
Incorrect metadata area header checksum
Refusing activation of partial LV lvstore. Use --partial to override.
2 logical volume(s) in volume group "vglinux" now active
(The two LVs that it did start are the irrelevant ones.)
So things are confusing:
First, it'd be awesome to know where exactly that "incorrect metada area
header checksum" is coming from. Maybe, y'know, a device to look at, or
some further hint of where to start tracking things down? [ :) ]
Second, if I look in /etc/lvm/archive for vglinux's latest, I find this
bit buried in there:
pv2 {
id = "4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc"
device = "/dev/md8" # Hint only
status = ["ALLOCATABLE"]
flags = ["MISSING"]
dev_size = 878979840 # 419.13 Gigabytes
pe_start = 384
pe_count = 107297 # 419.129 Gigabytes
}
which seems to be why it's complaining about 'partial PV lvstore'. But,
uh, 4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc _is_ the UUID of /dev/md8:
# pvs -o +uuid --unit=4m
Incorrect metadata area header checksum
Unable to find "/dev/sdb5" in volume group "vglinux"
PV VG Fmt Attr PSize PFree PV UUID
/dev/md10 vglinux lvm2 a- 107297.00U 0U
LO5KoK-1AjU-iXb0-fkLo-lUKR-Yo9P-wDZQPP
/dev/md11 vglinux lvm2 a- 107297.00U 0U
gBGcjz-DmIb-pAj9-CWnb-jopW-Wd19-iIs1ur
/dev/md125 vglinux lvm2 a- 107297.00U 8607.00U
5JlNTx-yT14-271r-NMAm-a17W-FKe4-pXoOW4
/dev/md13 vglinux lvm2 a- 107297.00U 0U
MJlTQO-lCyE-bP80-FlvE-m1nM-DD2x-qhlIQK
/dev/md14 vglinux lvm2 a- 107297.00U 0U
XDpA1D-kxbq-SEck-ozTl-rP4Y-bMws-MBwNNf
/dev/md5 lvm2 a- 71467.50U 71467.50U
39oFQs-9tlf-ywT4-YgtX-nfcm-rAEq-pAPsdR
/dev/md6 vglinux lvm2 a- 71531.00U 35856.00U
ufKOpM-02YG-12rJ-mt1r-DbEm-xoJu-onzEtr
/dev/md7 vglinux lvm2 a- 71531.00U 71531.00U
NpAKLQ-4Irn-wDA4-0ZDI-ydW6-eY9n-rDp50e
/dev/md8 vglinux lvm2 a- 107297.00U 0U
4F3rcV-sS8p-E6t2-hjGm-gLVB-C6wl-4McUhc
/dev/md9 vglinux lvm2 a- 107297.00U 0U
hRmTMN-Mx17-uUEX-rF1Z-hQ1J-8iDd-S7S2t7
/dev/md99 vglinux lvm2 a- 357667.00U 178748.00U
jUgxoF-mvwR-6C8A-wzjP-K0Xu-MPf8-XewqUE
Finally, note that "Unable to find /dev/sdb5 in vglinux" complaint, and
note that /dev/md5 is _not_ listed as part of vglinux. md5 shouldn't be
part of vglinux right now, and sdb5 has never been a PV on its own (it's
only ever been a part of the md5 PV). WTFO? As it happens, I didn't
actually reshape /dev/md5: after the pvmove, I shredded the md and
recreated it instead. I suppose it's possible that I forgot to vgreduce
before doing that?
Googling and reading indicates that I need to clear that MISSING flag,
and that vgcfgrestore is the only tool for that job -- but editing that
archive file to remove the MISSING flag and trying vgcfgrestore with
that doesn't work:
# vgcfgrestore --debug --verbose --test --file wtfvglinux vglinux
Test mode: Metadata will NOT be updated.
Incorrect metadata area header checksum
Incorrect metadata area header checksum
Restore failed.
Test mode: Wiping internal cache
Wiping internal VG cache
so, at this point, some guidance would be most welcome.
(Also note that before I did the revert-reshape, I dd'd
/dev/sd{b,d,e,f}8 to spare partitions as a backup. It may be relevant
that there are two copies of the metadata for md8's devices?)
Thanks very much,
Flynn
--
The trick is to keep breathing. (Garbage, from _Version 2.0_)
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/