Re: thin: pool target too small

Duncan Townsend <duncancmt@xxxxxxxxx> · Fri, 2 Oct 2020 08:05:28 -0500

On Wed, Sep 30, 2020, 1:00 PM Duncan Townsend <duncancmt@xxxxxxxxx> wrote:
On Tue, Sep 29, 2020, 10:54 AM Zdenek Kabelac <zkabelac@xxxxxxxxxx> wrote:
Dne 29. 09. 20 v 16:33 Duncan Townsend napsal(a):

> On Sat, Sep 26, 2020, 8:30 AM Duncan Townsend <duncancmt@xxxxxxxxx 

> <mailto:duncancmt@xxxxxxxxx>> wrote:

> 

>>      > > There were further error messages as further snapshots were attempted,

>      > > but I was unable to capture them as my system went down. Upon reboot,

>      > > the "transaction_id" message that I referred to in my previous message

>      > > was repeated (but with increased transaction IDs).

>      >

>      > For better fix it would need to be better understood what has happened

>      > in parallel while 'lvm' inside dmeventd was resizing pool data.

> 

So the lvm2 has been fixed upstream to report more educative messages to

the user - although it still does require some experience in managing

thin-pool kernel metadata and lvm2 metadata.

That's good news! However, I believe I lack the requisite experience. Is there some documentation that I ought to read as a starting point? Or is it best to just read the source?

>     To the best of my knowledge, no other LVM operations were in flight at

>     the time. The script that I use issues LVM commands strictly

In your case - dmeventd did 'unlocked' resize - while other command

was taking a snapshot - and it happened the sequence with 'snapshot' has

won - so until the reload of thin-pool - lvm2 has not spotted difference.

(which is simply a bad race cause due to badly working locking on your system)

After reading more about lvm locking, it looks like the original issue might have been that the locking directory lives on a lv instead of on a non-lvm-managed block device. (Although, the locking directory is on a different vg on a different pv from the one that had the error.)

Is there a way to make dmeventd (or any other lvm program) abort if this locking fails? Should I switch to using a clustered locking daemon (even though I have only the single, non-virtualized host)?

>     Would it be reasonable to use vgcfgrestore again on the

>     manually-repaired metadata I used before? I'm not entirely sure what

You will need to vgcfgrestore - but I think you've misused my passed recoverd 

piece, where I've specifically asked to only replace specific segments of 

resized thin-pool within your latest VG metadata - since those likely have

all the proper mappings to thin LVs.

All I did was use vgcfgrestore to apply the metadata file attached to your previous private email. I had to edit the transaction number, as I noted previously. That was a single line change. Was that the wrong thing to do? I lack the experience with lvm/thin metadata, so I am flying a bit blind here. I apologize if I've made things worse.

While you have taken the metadata from 'resize' moment - you've lost all

the thinLV lvm2 metadata for later created one.

I'll try to make one for you.

Thank you very much. I am extremely grateful that you've helped me so much in repairing my system.

>     to look for while editing the XML from thin_dump, and I would very

>     much like to avoid causing further damage to my system. (Also, FWIW,

>     thin_dump appears to segfault when run with musl-libc instead of

Well - lvm2 is glibc oriented project - so users of those 'esoteric'

distribution need to be expert on its own.

If you can provide coredump or even better patch for crash - we might

replace the code with something better usable - but there is zero testing

with anything else then glibc...

Noted. I believe I'll be switching to glibc because there are a number of other packages that are broken for this distro.

If you have an interest, this is the issue I've opened with my distro about the crash: https://github.com/void-linux/void-packages/issues/25125 . I despair that this will receive much attention, given that not even gdb works properly.

Hello! Could somebody advise whether restoring the VG metadata is likely to cause this system's condition to worsen? At this point, all I want is to do is get the data off this drive and then start over with something more stable.

Thanks for the help!
--Duncan Townsend

P.S. This was written on mobile. Please forgive my typos.

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/