Re: inconsistency between thin pool metadata mapped_blocks and lvs output

John Hamilton <john.l.hamilton@gmail.com> · Wed, 16 May 2018 09:43:30 -0500

So it turns out simply running lvconvert --repair fixed the issue and lvs is now reporting the correct utilization.

On Fri, May 11, 2018 at 12:09 PM John Hamilton <john.l.hamilton@gmail.com> wrote:
Thanks for the response.  
>Is this everything?

Yes, that is everything in the metadata xml dump.  I just removed all of the *_mapping entries for brevity.  For the lvs output I removed other logical volumes that aren't related to this pool.

>Is this a pool used by docker, which does not (did not) use LVM to manage thin-volumes?

It's not docker, but it is an application called serviced that uses docker's library for managing the volumes

>LVM just queries DM, and displays whatever that provides

Yeah, it looks like dmsetup status output matches lvs:
myvg-my--pool: 0 5242880000 thin-pool 70 207941/4145152 29018611/40960000 - rw discard_passdown queue_if_no_space -
myvg-my--pool_tdata: 0 4194304000 linear
myvg-my--pool_tdata: 4194304000 1048576000 linear
myvg-my--pool_tmeta: 0 33161216 linear
>What is kernel/lvm version?
# uname -r
3.10.0-693.21.1.el7.x86_64
# lvm version
LVM version:     2.02.171(2)-RHEL7 (2017-05-03)
Library version: 1.02.140-RHEL7 (2017-05-03)
Driver version:  4.35.0
Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
>Is thin_check_executable configured in lvm.conf?

Yes

I also just found out that they apparently ran thin_check recently and got a message about a corrupt superblock, but didn't repair it.  They were still able to re-activate the pool though. We'll run a repair as soon as we get a chance and see if that fixes it.

Thanks,

John

On Fri, May 11, 2018 at 3:54 AM Marian Csontos <mcsontos@redhat.com> wrote:
On 05/11/2018 10:21 AM, Joe Thornber wrote:

> On Thu, May 10, 2018 at 07:30:09PM +0000, John Hamilton wrote:

>> I saw something today that I don't understand and I'm hoping somebody can

>> help.  We had a ~2.5TB thin pool that was showing 69% data utilization in

>> lvs:

>>

>> # lvs -a

>>    LV                    VG       Attr       LSize  Pool Origin Data%

>> Meta%  Move Log Cpy%Sync Convert

>>    my-pool         myvg twi-aotz--  2.44t             69.04  4.90

>>    [my-pool_tdata] myvg Twi-ao----  2.44t

>>    [my-pool_tmeta] myvg ewi-ao---- 15.81g

Is this everything? Is this a pool used by docker, which does not (did 

not) use LVM to manage thin-volumes?

>> However, when I dump the thin pool metadata and look at the mapped_blocks

>> for the 2 devices in the pool, I can only account for about 950GB.  Here is

>> the superblock and device entries from the metadata xml.  There are no

>> other devices listed in the metadata:

>>

>> <superblock uuid="" time="34" transaction="68" flags="0" version="2"

>> data_block_size="128" nr_data_blocks="0">

>>    <device dev_id="1" mapped_blocks="258767" transaction="0"

>> creation_time="0" snap_time="14">

>>    <device dev_id="8" mapped_blocks="15616093" transaction="27"

>> creation_time="15" snap_time="34">

>>

>> That first device looks like it has about 16GB allocated to it and the

>> second device about 950GB.  So, I would expect lvs to show somewhere

>> between 950G-966G Is something wrong, or am I misunderstanding how to read

>> the metadata dump?  Where is the other 700 or so GB that lvs is showing

>> used?

> 

> The non zero snap_time suggests that you're using snapshots.  I which case it

> could just be there is common data shared between volumes that is getting counted

> more than once.

> 

> You can confirm this using the thin_ls tool and specifying a format line that

> includes EXCLUSIVE_BLOCKS, or SHARED_BLOCKS.  Lvm doesn't take shared blocks into

> account because it has to scan all the metadata to calculate what's shared.

LVM just queries DM, and displays whatever that provides. You could see 

that in `dmsetup status` output, there are two pairs of '/' separated 

entries - first is metadata usage (USED_BLOCKS/ALL_BLOCKS), second data 

usage (USED_CHUNKS/ALL_CHUNKS).

So the error lies somewhere between dmsetup and kernel.

What is kernel/lvm version?

Is thin_check_executable configured in lvm.conf?

-- Martian

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/