Re: Snapshot behavior on classic LVM vs ThinLVM

Xen <list@xenhideout.nl> · Sat, 03 Mar 2018 19:17:11 +0100

Gionatan Danti schreef op 28-02-2018 20:07:

To recap (Zdeneck, correct me if I am wrong): the main problem is
that, on a full pool, async writes will more-or-less silenty fail
(with errors shown on dmesg, but nothing more).

Yes I know you were writing about that in the later emails.

Another possible cause
of problem is that, even on a full pool, *some* writes will complete
correctly (the one on already allocated chunks).

Idem.

In the past was argued that putting the entire pool in read-only mode
(where *all* writes fail, but read are permitted to complete) would be
a better fail-safe mechanism; however, it was stated that no current
dmtarget permit that.

Right. Don't forget my main problem was system hangs due to older 
kernels, not the stuff you write about now.

Two (good) solution where given, both relying on scripting (see
"thin_command" option on lvm.conf):
- fsfreeze on a nearly full pool (ie: >=98%);
- replace the dmthinp target with the error target (using dmsetup).

I really think that with the good scripting infrastructure currently
built in lvm this is a more-or-less solved problem.

I agree in practical terms. Doesn't make for good target design, but 
it's good enough, I guess.

Do NOT take thin snapshot of your root filesystem so you will avoid
thin-pool overprovisioning problem.

But is someone *really* pushing thinp for root filesystem? I always
used it for data partition only... Sure, rollback capability on root
is nice, but it is on data which they are *really* important.

No, Zdenek thought my system hangs resulted from something else and then 
in order to defend against that (being the fault of current DM design) 
he tried to raise the ante by claiming that root-on-thin would cause 
system failure anyway with a full pool.

I never suggested root on thin.

In stress testing, I never saw a system crash on a full thin pool

That's good to know, I was just using Jessie and Xenial.

We discussed that in the past also, but as snapshot volumes really are
*regular*, writable volumes (which a 'k' flag to skip activation by
default), the LVM team take the "safe" stance to not automatically
drop any volume.

Sure I guess any application logic would have to be programmed outside 
of any (device mapper module) anyway.

The solution is to use scripting/thin_command with lvm tags. For 
example:
- tag all snapshot with a "snap" tag;
- when usage is dangerously high, drop all volumes with "snap" tag.

Yes, now I remember.

I was envisioning some other tag that would allow a quotum to be set for 
every volume (for example as a %) and the script would then drop the 
volumes with the larger quotas first (thus the larger snapshots) so as 
to protect smaller volumes which are probably more important and you can 
save more of them. I am ashared to admit I had forgotten about that 
completely ;-).

Back to rule #1 - thin-p is about 'delaying' deliverance of real 
space.
If you already have plan to never deliver promised space - you need to
live with consequences....

I am not sure to 100% agree on that.

When Zdenek says "thin-p" he might mean "thin-pool" but not generally 
"thin-provisioning".

I mean to say that the very special use case of an always auto-expanding 
system is a special use case of thin provisioning in general.

And I would agree, of course, that the other uses are also legit.

Thinp is not only about
"delaying" space provisioning; it clearly is also (mostly?) about
fast, modern, usable snapshots. Docker, snapper, stratis, etc. all use
thinp mainly for its fast, efficent snapshot capability.

Thank you for bringing that in.

Denying that
is not so useful and led to "overwarning" (ie: when snapshotting a
volume on a virtually-fillable thin pool).

Aye.

!SNAPSHOTS ARE NOT BACKUPS!

Snapshot are not backups, as they do not protect from hardware
problems (and denying that would be lame)

I was really saying that I was using them to run backups off of.

however, they are an
invaluable *part* of a successfull backup strategy. Having multiple
rollaback target, even on the same machine, is a very usefull tool.

Even more you can backup running systems, but I thought that would be 
obvious.

Again, I don't understand by we are speaking about system crashes. On
root *not* using thinp, I never saw a system crash due to full data
pool.

I had it on 3.18 and 4.4, that's all.

Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are
way too limited).

That could be it too.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/