Re: lvm metadata sequence number reverts

Aaron Young <aaron.young@ctl.io> · Thu, 17 Sep 2015 15:39:09 +0000

Actually, I made a mistake and failed to drop the system cache between dds  when generating the comparison. There is one difference, in sector 2056 of the device! This must be the key.

On Wed, 16 Sep 2015 at 16:31 Aaron Young <aaron.young@ctl.io> wrote:
Yes, I have lots of data to share, I thought first to open at high level. This is all happening inside a single VM. Archive is available, I will post them shortly. No lvmetad. No errors that I can tell (at least not on console or syslog). 

root@VA1CTLT-SRN2-03:/etc/lvm/archive# grep seqno test_dvol-13-vg_00* 
test_dvol-13-vg_00261-1410850844.vg:	seqno = 0  <---- before vgcreate
test_dvol-13-vg_00262-1188507802.vg:	seqno = 1   <-- before lvcreate 1
test_dvol-13-vg_00263-1818746321.vg:	seqno = 2   <---- before lvcreate 2
test_dvol-13-vg_00264-1122545952.vg:	seqno = 3   <--- before lvcreate 3
test_dvol-13-vg_00265-1497145254.vg:	seqno = 4  <---- before lvcreate 4
test_dvol-13-vg_00266-1300493675.vg:	seqno = 5  <--- before lvs
test_dvol-13-vg_00267-490193445.vg:	seqno = 4   <----- disabled device cache, lvs
test_dvol-13-vg_00268-2051497792.vg:	seqno = 4  <----- disabled device cache, lvs
test_dvol-13-vg_00269-370016695.vg:	seqno = 5   <---- enabled device cache, lvs

The contents of the metadata area seems to be the same (both contain seqno 5):

 dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.nocache
dd if=/dev/sbd13 bs=1M count=1 skip=1 of=sbd13.cache

cmp sbd13.nocache sbd13.cache

I tracked down these sectors by running strace on pvcreate/vgcreate/lvcreate. As far as I can tell, all the sectors involved are being written correctly.

Random facts:
1. Devicemapper still correctly lists the logical volume that is missing from lvs
2. 3.13.0-44-generic, Ubuntu 14.04
3. LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.02.77 (2012-10-15)
  Driver version:  4.27.0

Random suspicious snippet generated by lvscan -vvv

/dev/mapper/sbd13p1: lvm2 label detected at sector 1

lvmcache: /dev/mapper/sbd13p1: now in VG #orphans_lvm2 (#orphans_lvm2) with 1 mdas

/dev/mapper/sbd13p1: Found metadata at 8704 size 1749 (in area at 4096 size 1044480) for test_dvol-13-vg (DFvQDG-nYVS-QQlT-Uv35-aPr4-2pY0-zMQ0dr)

        lvmcache: /dev/mapper/sbd13p1: now in VG test_dvol-13-vg with 1 mdas

        lvmcache: /dev/mapper/sbd13p1: setting test_dvol-13-vg VGID to DFvQDGnYVSQQlTUv35aPr42pY0zMQ0dr

lvmcache: /dev/mapper/sbd13p1: VG test_dvol-13-vg: Set creation host to VA1CTLT-SRN2-03.
        Allocated VG test_dvol-13-vg at 0x257bc00.

Using cached label for /dev/mapper/sbd13p1

Read test_dvol-13-vg metadata (4) from /dev/mapper/sbd13p1 at 8704 size 1749

/dev/mapper/sbd13p1 0:      0     19: VM-test_dvol-13-0-hard-drive-0(0:0)

/dev/mapper/sbd13p1 1:     19     19: VM-test_dvol-13-0-hard-drive-1(0:0)

        /dev/mapper/sbd13p1 2:     38     19: VM-test_dvol-13-1-hard-drive-0(0:0)

        /dev/mapper/sbd13p1 3:     57     42: NULL(0:0)
*<----missing logical volume*

I don't understand how this is possible if that sector (8704) is identical in both cases.

Attached are two verbose straces of vgdisplay, one of which discovered 3 logical volumes and one of that discovers 4.
I am looking for insight into the disk contents that are necessary for this discovery. Thank you very much.

Aaron

On Wed, 16 Sep 2015 at 03:05 Zdenek Kabelac <zkabelac@redhat.com> wrote:
Dne 15.9.2015 v 23:18 Aaron Young napsal(a):

> Hello, I'm deep into debugging an issue we have with a disk driver of ours and

> LVM.

>

> Long story short:

>

> create vg -> seqno 1

> create lv1 -> seqno 2

> create lv2 -> seqno 3

> create lv3 -> seqno 4

> create lv4 -> seqno 5

> <clear our device cache> (note, this generates no IO)

> vgdisplay: seqno = 4, lv4 is missing

>

> * This happens only after dozens to hundreds of iterations. Most of the time

> it is fine.

>

> I dd all the metadata blocks off of the pv, yep, seqno5 is on disk metadata

> area perfectly fine. But the system believes 4 is the current version.

> Shouldn't the system be using the highest value? Or is it stored somewhere?

> What mechanism is responsible for changing the seqno? And where does it change

> it? (Not the metadata contents, just the number)

Hi

Your email is quite 'mystic' - I'd need lots of crystal balls to see your

surrounding conditions.

1.) Is this 'clustered' environment or a  'single' host setup ?

2.) Do you have 'archive' backup enabled  - can you check what are last

operations in history before problem happens?

3.) Are you using 'lvmetad' ? (if so, try  use_lvmetad=0 )

4.) Kernel version,  lvm2  version ?

5.) Was there any lvm2 command error  ?

(as vgdisplay may just do a backup of most recent metadata in case they are

are missing after some command failure)

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/