Re: Reserve space for specific thin logical volumes

Zdenek Kabelac <zkabelac@redhat.com> · Wed, 13 Sep 2017 01:02:20 +0200

Dne 13.9.2017 v 00:41 Gionatan Danti napsal(a):
Il 13-09-2017 00:16 Zdenek Kabelac ha scritto:
Dne 12.9.2017 v 23:36 Gionatan Danti napsal(a):
Il 12-09-2017 21:44 matthew patton ha scritto:

Again, please don't speak about things you don't know.
I am *not* interested in thin provisioning itself at all; on the other 
side, I find CoW and fast snapshots very useful.

Not going to comment KVM storage architecture - but with this statemnet -
you have VERY simple usage:

Just minimize chance for overprovisioning -

let's go by example:

you have  10  10GiB volumes  and you have 20 snapshots...

to not overprovision - you need 10 GiB * 30 LV  = 300GiB thin-pool.

if that sounds too-much.

you can go with 150 GiB - to always 100% cover all 'base' volumes.
and have some room for snapshots.

Now the fun begins - while monitoring is running -
you get callback for  50%, 55%... 95% 100%
at each moment  you can do whatever action you need.

So assume 100GiB is bare minimum for base volumes - you ignore any
state with less then 66% occupancy of thin-pool and you start solving
problems with 85% (~128GiB)- you know some snapshot is better to be
dropped.
You may try 'harder' actions for higher percentage.
(you need to consider how many dirty pages you leave floating your system
and other variables)

Also you pick with some logic the snapshot which you want to drop -
Maybe the oldest ?
(see airplane :) URL link)....

Anyway - you have plenty of time to solve it still at this moment
without any danger of losing write operation...
All you can lose is some 'snapshot' which might have been present a
bit longer...  but that is supposedly fine with your model workflow...

Of course you are getting in serious problem, if you try to keep all
these demo-volumes within 50GiB with massive overprovisioning ;)

There you have much hard times what should happen what should be
removed and where is possibly better to STOP everything and let admin
decide what is the ideal next step....

Hi Zdenek,
I fully agree with what you said above, and I sincerely thank you for taking 
the time to reply.
However, I am not sure to understand *why* reserving space for a thin volume 
seems a bad idea to you.

Lets have a 100 GB thin pool, and wanting to *never* run out of space in spite 
of taking multiple snapshots.
To achieve that, I need to a) carefully size the original volume, b) ask the 
thin pool to reserve the needed space and c) counting the "live" data (REFER 
in ZFS terms) allocated inside the thin volume.

Step-by-step example:
- create a 40 GB thin volume and subtract its size from the thin pool (USED 40 
GB, FREE 60 GB, REFER 0 GB);
- overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- completely overwrite the original volume (USED 80 GB, FREE 20 GB, REFER 40 GB);
- a new snapshot creation will fails (REFER is higher then FREE).

Result: thin pool is *never allowed* to fill. You need to keep track of 
per-volume USED and REFER space, but thinp performance should not be impacted 
in any manner. This is not theoretical: it is already working in this manner 
with ZVOLs and refreservation, *without* involing/requiring any advanced 
coupling/integration between block and filesystem layers.

Don't get me wrong: I am sure that, if you choose to not implement this 
scheme, you have a very good reason to do that. Moreover, I understand that 
patches are welcome :)

But I would like to understand *why* this possibility is ruled out with such 
firmness.

There could be a simple answer and complex one :)

I'd start with simple one - already presented here -

when you write to INDIVIDUAL thin volume target - respective dn thin target 
DOES manipulate with single btree set - it does NOT care there are some other 
snapshot and never influnces them -

You ask here to heavily 'change' thin-pool logic - so writing to THIN volume A 
 can remove/influence volume B - this is very problematic for meny reasons.

We can go into details of BTree updates  (that should be really discussed with 
its authors on dm channel ;)) - but I think the key element is capturing the 
idea the usage of thinLV A does not change thinLV B.

----

Now to your free 'reserved' space fiction :)
There is NO way to decide WHO deserves to use the reserve :)

Every thin volume is equal - (the fact we call some thin LV snapshot is 
user-land fiction - in kernel all thinLV are just equal -  every thinLV 
reference set of thin-pool chunks)  -

(for late-night thinking -  what would be snapshot of snapshot which is fully 
overwritten ;))

So when you now see that all thinLVs  just maps set of chunks,
and all thinLVs can be active and running concurrently - how do you want to 
use reserves in thin-pool :) ?
When do you decide it ?  (you need to see this is total race-lend)
How do you actually orchestrate locking around this single point of failure ;) ?
You will surely come with and idea of having reserve separate for every thinLV ?
How big it should actually be ?
Are you going to 'refill' those reserves  when thin-pool gets emptier ?
How you decide which thinLV deserves bigger reserves ;) ??

I assume you can start to SEE the whole point of this misery....

So instead -  you can start with normal thin-pool - keep it simple in kernel,
and solve complexity in user-space.

There you can decide - if you want to extend thin-pool...
You may drop some snapshot...
You may fstrim mounted thinLVs...
You can kill volumes way before the situation becomes unmaintable....

All you need to accept is - you will kill them at 95% -
in your world with reserves it would be already reported as 100% full,
with totally unknown size of reserves :)

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/