Re: lvm metadata sequence number reverts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, I have lots of data to share, I thought first to open at high level. This is all happening inside a single VM. Archive is available, I will post them shortly. No lvmetad. No errors that I can tell (at least not on console or syslog).

root@VA1CTLT-SRN2-03:/etc/lvm/archive# grep seqno test_dvol-13-vg_00*
test_dvol-13-vg_00261-1410850844.vg: seqno = 0  <---- before vgcreate
test_dvol-13-vg_00262-1188507802.vg: seqno = 1   <-- before lvcreate 1
test_dvol-13-vg_00263-1818746321.vg: seqno = 2   <---- before lvcreate 2
test_dvol-13-vg_00264-1122545952.vg: seqno = 3   <--- before lvcreate 3
test_dvol-13-vg_00265-1497145254.vg: seqno = 4  <---- before lvcreate 4
test_dvol-13-vg_00266-1300493675.vg: seqno = 5  <--- before lvs
test_dvol-13-vg_00267-490193445.vg: seqno = 4   <----- disabled device cache, lvs
test_dvol-13-vg_00268-2051497792.vg: seqno = 4  <----- disabled device cache, lvs
test_dvol-13-vg_00269-370016695.vg: seqno = 5   <---- enabled device cache, lvs

The contents of the metadata area seems to be the same (both contain seqno 5):

dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.nocache
dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.cache

cmp sbd13.nocache sbd13.cache

I tracked down these sectors by running strace on pvcreate/vgcreate/lvcreate. As far as I can tell, all the sectors involved are being written correctly.

Random facts:
1. Devicemapper still correctly lists the logical volume that is missing from lvs
2. 3.13.0-44-generic, Ubuntu 14.04
3. LVM version: 2.02.98(2) (2012-10-15) Library version: 1.02.77 (2012-10-15) Driver version: 4.27.0

Random suspicious snippet generated by lvscan -vvv

/dev/mapper/sbd13p1: lvm2 label detected at sector 1
lvmcache: /dev/mapper/sbd13p1: now in VG #orphans_lvm2 (#orphans_lvm2) with 1 mdas
/dev/mapper/sbd13p1: Found metadata at 8704 size 1749 (in area at 4096 size 1044480) for test_dvol-13-vg (DFvQDG-nYVS-QQlT-Uv35-aPr4-2pY0-zMQ0dr)
lvmcache: /dev/mapper/sbd13p1: now in VG test_dvol-13-vg with 1 mdas
lvmcache: /dev/mapper/sbd13p1: setting test_dvol-13-vg VGID to DFvQDGnYVSQQlTUv35aPr42pY0zMQ0dr
lvmcache: /dev/mapper/sbd13p1: VG test_dvol-13-vg: Set creation host to VA1CTLT-SRN2-03. Allocated VG test_dvol-13-vg at 0x257bc00.
Using cached label for /dev/mapper/sbd13p1
Read test_dvol-13-vg metadata (4) from /dev/mapper/sbd13p1 at 8704 size 1749
/dev/mapper/sbd13p1 0: 0 19: VM-test_dvol-13-0-hard-drive-0(0:0)
/dev/mapper/sbd13p1 1: 19 19: VM-test_dvol-13-0-hard-drive-1(0:0)
/dev/mapper/sbd13p1 2: 38 19: VM-test_dvol-13-1-hard-drive-0(0:0)
/dev/mapper/sbd13p1 3: 57 42: NULL(0:0) *<----missing logical volume*

I don't understand how this is possible if that sector (8704) is identical in both cases.

Attached are two verbose straces of vgdisplay, one of which discovered 3 logical volumes and one of that discovers 4.
I am looking for insight into the disk contents that are necessary for this discovery. Thank you very much.

Aaron



On Wed, 16 Sep 2015 at 03:05 Zdenek Kabelac <zkabelac@redhat.com> wrote:
Dne 15.9.2015 v 23:18 Aaron Young napsal(a):
> Hello, I'm deep into debugging an issue we have with a disk driver of ours and
> LVM.
>
> Long story short:
>
> create vg -> seqno 1
> create lv1 -> seqno 2
> create lv2 -> seqno 3
> create lv3 -> seqno 4
> create lv4 -> seqno 5
> <clear our device cache> (note, this generates no IO)
> vgdisplay: seqno = 4, lv4 is missing
>
> * This happens only after dozens to hundreds of iterations. Most of the time
> it is fine.
>
> I dd all the metadata blocks off of the pv, yep, seqno5 is on disk metadata
> area perfectly fine. But the system believes 4 is the current version.
> Shouldn't the system be using the highest value? Or is it stored somewhere?
> What mechanism is responsible for changing the seqno? And where does it change
> it? (Not the metadata contents, just the number)


Hi

Your email is quite 'mystic' - I'd need lots of crystal balls to see your
surrounding conditions.


1.) Is this 'clustered' environment or a  'single' host setup ?

2.) Do you have 'archive' backup enabled  - can you check what are last
operations in history before problem happens?

3.) Are you using 'lvmetad' ? (if so, try  use_lvmetad=0 )

4.) Kernel version,  lvm2  version ?

5.) Was there any lvm2 command error  ?
(as vgdisplay may just do a backup of most recent metadata in case they are
are missing after some command failure)

Zdenek

Attachment: vgdisplay.strace.right
Description: Binary data

Attachment: vgdisplay.strace.wrong
Description: Binary data

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux