Re: Snapshot behavior on classic LVM vs ThinLVM

Zdenek Kabelac <zkabelac@redhat.com> · Wed, 28 Feb 2018 22:43:26 +0100

Dne 28.2.2018 v 20:07 Gionatan Danti napsal(a):
Hi all,

Il 28-02-2018 10:26 Zdenek Kabelac ha scritto:
Overprovisioning on DEVICE level simply IS NOT equivalent to full
filesystem like you would like to see all the time here and you've
been already many times explained that filesystems are simply not
there ready - fixes are on going but it will take its time and it's
really pointless to exercise this on 2-3 year old kernels...

this was really beaten to death in the past months/threads. I generally agree 
with Zedenk.

To recap (Zdeneck, correct me if I am wrong): the main problem is that, on a 
full pool, async writes will more-or-less silenty fail (with errors shown on 
dmesg, but nothing more). Another possible cause of problem is that, even on a 
full pool, *some* writes will complete correctly (the one on already allocated 
chunks).

On default - full pool starts to 'error' all 'writes' in 60 seconds.

In the past was argued that putting the entire pool in read-only mode (where 
*all* writes fail, but read are permitted to complete) would be a better 
fail-safe mechanism; however, it was stated that no current dmtarget permit that.

Yep - I'd probably like to see slightly different mechanism - that all
on going writes would be failing  - so far - some 'writes' will pass
(those to already provisioned areas) - some will fail (those to unprovisioned).

The main problem is - after reboot - this 'missing/unprovisioned' space may 
provide some old data...

Two (good) solution where given, both relying on scripting (see "thin_command" 
option on lvm.conf):
- fsfreeze on a nearly full pool (ie: >=98%);
- replace the dmthinp target with the error target (using dmsetup).

Yep - this all can happen via 'monitoring.
The key is to do it early before disaster happens.

I really think that with the good scripting infrastructure currently built in 
lvm this is a more-or-less solved problem.

It still depends - there is always some sort of 'race' - unless you are 
willing to 'give-up' too early to be always sure, considering there are 
technologies that may write many GB/s...

Do NOT take thin snapshot of your root filesystem so you will avoid
thin-pool overprovisioning problem.

But is someone *really* pushing thinp for root filesystem? I always used it 

You can use rootfs with thinp - it's very fast for testing i.e. upgrades
and quickly revert back - just there should be enough free space.

In stress testing, I never saw a system crash on a full thin pool, but I was 
not using it on root filesystem. There are any ill effect on system stability 
which I need to know?

Depends on version of kernel and filesystem in use.

Note RHEL/Centos kernel has lots of backport even when it's look quite old.

The solution is to use scripting/thin_command with lvm tags. For example:
- tag all snapshot with a "snap" tag;
- when usage is dangerously high, drop all volumes with "snap" tag.

Yep - every user has different plans in his mind - scripting gives user 
freedom to adapt this logic to local needs...

However, I don't have the space for a full copy of every filesystem, so if 
I snapshot, I will automatically overprovision.

As long as admin responsible controls space in thin-pool and takes action
long time before thin-pool runs out-of-space all is fine.

If admin hopes in some kind of magic to happen - we have a problem....

Back to rule #1 - thin-p is about 'delaying' deliverance of real space.
If you already have plan to never deliver promised space - you need to
live with consequences....

I am not sure to 100% agree on that. Thinp is not only about "delaying" space 
provisioning; it clearly is also (mostly?) about fast, modern, usable 
snapshots. Docker, snapper, stratis, etc. all use thinp mainly for its fast, 
efficent snapshot capability. Denying that is not so useful and led to 
"overwarning" (ie: when snapshotting a volume on a virtually-fillable thin pool).

Snapshot are using space - with hope that if you will 'really' need that space
you either add this space to you system - or you drop snapshots.

Still the same logic applied....

!SNAPSHOTS ARE NOT BACKUPS!

This is the key problem with your thinking here (unfortunately you are
not 'alone' with this thinking)

Snapshot are not backups, as they do not protect from hardware problems (and 
denying that would be lame); however, they are an invaluable *part* of a 
successfull backup strategy. Having multiple rollaback target, even on the 
same machine, is a very usefull tool.

Backups primarily sits on completely different storage.

If you keep backup of data in same pool:

1.)
error on this in single chunk shared by all your backup + origin - means it's 
total data loss - especially in case where filesystem are using 'BTrees' and 
some 'root node' is lost - can easily render you origin + all backups 
completely useless.

2.)
problems in thin-pool metadata can make all your origin+backups just an 
unordered mess of chunks.

Again, I don't understand by we are speaking about system crashes. On root 
*not* using thinp, I never saw a system crash due to full data pool. >
Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are way too 
limited).

Yep - this case is known to be pretty stable.

But as said - with today 'rush' of development and load of updates - user do 
want to try 'new disto upgrade' - if it works - all is fine - if it doesn't 
let's have a quick road back -  so using thin volume for rootfs is pretty 
wanted case.

Trouble is there is quite a lot of issues non-trivial to solve.

There are also some on going ideas/projects - one of them was to have thinLVs 
with priority to be always fully provisioned - so such thinLV could never be 
the one to have unprovisioned chunks....
Other was a better integration of filesystem with 'provisioned' volumes.

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/