Re: Reserve space for specific thin logical volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dne 13.9.2017 v 00:41 Gionatan Danti napsal(a):
Il 13-09-2017 00:16 Zdenek Kabelac ha scritto:
Dne 12.9.2017 v 23:36 Gionatan Danti napsal(a):
Il 12-09-2017 21:44 matthew patton ha scritto:

Again, please don't speak about things you don't know.
I am *not* interested in thin provisioning itself at all; on the other side, I find CoW and fast snapshots very useful.


Not going to comment KVM storage architecture - but with this statemnet -
you have VERY simple usage:


Just minimize chance for overprovisioning -

let's go by example:

you have  10  10GiB volumes  and you have 20 snapshots...


to not overprovision - you need 10 GiB * 30 LV  = 300GiB thin-pool.

if that sounds too-much.

you can go with 150 GiB - to always 100% cover all 'base' volumes.
and have some room for snapshots.


Now the fun begins - while monitoring is running -
you get callback for  50%, 55%... 95% 100%
at each moment  you can do whatever action you need.


So assume 100GiB is bare minimum for base volumes - you ignore any
state with less then 66% occupancy of thin-pool and you start solving
problems with 85% (~128GiB)- you know some snapshot is better to be
dropped.
You may try 'harder' actions for higher percentage.
(you need to consider how many dirty pages you leave floating your system
and other variables)

Also you pick with some logic the snapshot which you want to drop -
Maybe the oldest ?
(see airplane :) URL link)....

Anyway - you have plenty of time to solve it still at this moment
without any danger of losing write operation...
All you can lose is some 'snapshot' which might have been present a
bit longer...  but that is supposedly fine with your model workflow...

Of course you are getting in serious problem, if you try to keep all
these demo-volumes within 50GiB with massive overprovisioning ;)

There you have much hard times what should happen what should be
removed and where is possibly better to STOP everything and let admin
decide what is the ideal next step....


Hi Zdenek,
I fully agree with what you said above, and I sincerely thank you for taking the time to reply. However, I am not sure to understand *why* reserving space for a thin volume seems a bad idea to you.

Lets have a 100 GB thin pool, and wanting to *never* run out of space in spite of taking multiple snapshots. To achieve that, I need to a) carefully size the original volume, b) ask the thin pool to reserve the needed space and c) counting the "live" data (REFER in ZFS terms) allocated inside the thin volume.

Step-by-step example:
- create a 40 GB thin volume and subtract its size from the thin pool (USED 40 GB, FREE 60 GB, REFER 0 GB);
- overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
- completely overwrite the original volume (USED 80 GB, FREE 20 GB, REFER 40 GB);
- a new snapshot creation will fails (REFER is higher then FREE).

Result: thin pool is *never allowed* to fill. You need to keep track of per-volume USED and REFER space, but thinp performance should not be impacted in any manner. This is not theoretical: it is already working in this manner with ZVOLs and refreservation, *without* involing/requiring any advanced coupling/integration between block and filesystem layers.

Don't get me wrong: I am sure that, if you choose to not implement this scheme, you have a very good reason to do that. Moreover, I understand that patches are welcome :)

But I would like to understand *why* this possibility is ruled out with such firmness.


There could be a simple answer and complex one :)

I'd start with simple one - already presented here -

when you write to INDIVIDUAL thin volume target - respective dn thin target DOES manipulate with single btree set - it does NOT care there are some other snapshot and never influnces them -

You ask here to heavily 'change' thin-pool logic - so writing to THIN volume A can remove/influence volume B - this is very problematic for meny reasons.

We can go into details of BTree updates (that should be really discussed with its authors on dm channel ;)) - but I think the key element is capturing the idea the usage of thinLV A does not change thinLV B.


----


Now to your free 'reserved' space fiction :)
There is NO way to decide WHO deserves to use the reserve :)

Every thin volume is equal - (the fact we call some thin LV snapshot is user-land fiction - in kernel all thinLV are just equal - every thinLV reference set of thin-pool chunks) -

(for late-night thinking - what would be snapshot of snapshot which is fully overwritten ;))

So when you now see that all thinLVs  just maps set of chunks,
and all thinLVs can be active and running concurrently - how do you want to use reserves in thin-pool :) ?
When do you decide it ?  (you need to see this is total race-lend)
How do you actually orchestrate locking around this single point of failure ;) ?
You will surely come with and idea of having reserve separate for every thinLV ?
How big it should actually be ?
Are you going to 'refill' those reserves  when thin-pool gets emptier ?
How you decide which thinLV deserves bigger reserves ;) ??

I assume you can start to SEE the whole point of this misery....

So instead -  you can start with normal thin-pool - keep it simple in kernel,
and solve complexity in user-space.

There you can decide - if you want to extend thin-pool...
You may drop some snapshot...
You may fstrim mounted thinLVs...
You can kill volumes way before the situation becomes unmaintable....

All you need to accept is - you will kill them at 95% -
in your world with reserves it would be already reported as 100% full,
with totally unknown size of reserves :)

Regards

Zdenek







_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/




[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux