Re: Reserve space for specific thin logical volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zdenek Kabelac schreef op 13-09-2017 21:35:

We are moving here in right direction.

Yes - current thin-provisiong does not let you limit maximum number of
blocks individual thinLV can address (and snapshot is ordinary thinLV)

Every thinLV can address  exactly   LVsize/ChunkSize  blocks at most.

So basically the only options are allocation check with asynchronously derived intel that might be a few seconds late, as a way to execute some standard and general "prioritizing" policy, and an interventionalist policy that will (fs)freeze certain volumes depending on admin knowledge about what needs to happen in his/her particular instance.

This is part of the problem: you cannot calculate in advance what can happen, because by design, mayhem should not ensue, but what if your predictions are off?

Great - 'prediction' - we getting on the same page -  prediction is
big problem....

Yes I mean my own 'system' I generally of course know how much data is on it and there is no automatic data generation.

Matthew Patton referenced quotas in some email, I didn't know how to do it as quickly when I needed it so I created a loopback mount from a fixed sized container to 'solve' that issue when I did have an unpredictable data source... :p.

But if I do create snapshots (which I do every day) when the root and boot snapshots fill up (they are on regular lvm) they get dropped which is nice, but particularly the big data volume if I really were to move a lot of data around I might need to first get rid of the snapshots or else I don't know what will happen or when.

Also my system (yes I am an "outdated moron") does not have thin_ls tool yet so when I was last active here and you mentioned that tool (thank you for that, again) I created this little script that would give me also info:

$ sudo ./thin_size_report.sh
[sudo] password for xen:
Executing self on linux/thin
Individual invocation for linux/thin

    name               pct       size
    ---------------------------------
    data            54.34%     21.69g
    sites            4.60%      1.83g
    home             6.05%      2.41g
    --------------------------------- +
    volumes         64.99%     25.95g
    snapshots        0.09%     24.00m
    --------------------------------- +
    used            65.08%     25.97g
    available       34.92%     13.94g
    --------------------------------- +
    pool size      100.00%     39.91g

The above "sizes" are not volume sizes but usage amounts.

And the % are % of total pool size.

So you can see I have 1/3 available on this 'overprovisioned' thin pool ;-).


But anyway.


Being able to set a maximum snapshot size before it gets dropped could be very nice.

You can't do that IN KERNEL.

The only tool which is able to calculate real occupancy - is
user-space thin_ls tool.

Yes my tool just aggregated data from "lvs" invocations to calculate the numbers.

If you say that any additional allocation checks would be infeasible because it would take too much time per request (which still seems odd because the checks wouldn't be that computation intensive and even for 100 gigabyte you'd only have 25.000 checks at default extent size) -- of course you asynchronously collect the data.

So I don't know if it would be *that* slow provided you collect the data in the background and not while allocating.

I am also pretty confident that if you did make a policy it would turn out pretty good.

I mean I generally like the designs of the LVM team.

I think they are some of the most pleasant command line tools anyway...

But anyway.

On the other hand if all you can do is intervene in userland, then all LVM team can do is provide basic skeleton for execution of some standard scripts.

So all you need to do is to use the tool in user-space for this task.

So maybe we can have an assortment of some 5 interventionalist policies like:

a) Govern max snapshot size and drop snapshots when they exceed this
b) Freeze non-critical volumes when thin space drops below aggegrate values appropriate for the critical volumes
c) Drop snapshots when thin space <5% starting with the biggest one
d) Also freeze relevant snapshots in case (b)
e) Drop snapshots when exceeding max configured size in case of threshold reach.

So for example you configure max size for snapshot. When snapshots exceeds size gets flagged for removal. But removal only happens when other condition is met (threshold reach).

So you would have 5 different interventions you could use that could be considered somewhat standard and the admit can just pick and choose or customize.


This is the main issue - these 'data' are pretty expensive to 'mine'
out of data structures.

But how expensive is it to do it say every 5 seconds?


It's the user space utility which is able to 'parse' all the structure
and take a 'global' picture. But of course it takes CPU and TIME and
it's not 'byte accurate'  -  that's why you need to start act early on
some threshold.

I get that but I wonder how expensive it would be to do that automatically all the time in the background.

It seems to already happen?

Otherwise you wouldn't be reporting threshold messages.

In any case the only policy you could have in-kernel would be either what Gionatan proposed (fixed reserved space for certain volumes) (easy calculation right) or potentially allocation freeze at threshold for non-critical volumes,


I say you only implement per-volume space reservation, but anyway.

I just still don't see how one check per 4MB would be that expensive provided you do data collection in background.

You say size can be as low as 64kB... well.... in that case...

You might have issues.



But in any case,

a) For intervention, choice is between customization by code and customization by values. b) Ready made scripts could take values but could also be easy to customize c) Scripts could take values from LVM config or volume config but must be easy to know/change/know about.

d) Scripts could document where to set the values.

e) Personally I would do the following:

a) Stop snapshots from working when a threshold is reached (95%) in a rapid fasion

   or

   a) Just let everything fill up as long as system doesn't crash

   b) Intervene to drop/freeze using scripts, where

1) I would drop snapshots starting with the biggest one in case of threshold reach (general)

2) I would freeze non-critical volumes ( I do not write to snapshots so that is no issue ) when critical volumes reached safety threshold in free space ( I would do this in-kernel if I could ) ( But Freezing In User-Space is almost the same ).

3) I would shrink existing volumes to better align with this "critical" behaviour because now they are all large size to make moving data easier

4) I would probably immediately implement these strategies if the scripts were already provided

5) Currently I already have reporting in place (by email) so I have no urgent need myself apart from still having an LVM version that crashes

f) For a critical volume script, it is worth considering that small volumes are more likely to be critical than big ones, so this could also prompt people to organize their volumes in that way, and have a standard mechanism to first protect the free space of smaller volumes against all of the bigger ones, then the next up is only protected against ITS bigger ones, and so on.

Basically when you have Big, Medium and Small, Medium is protected against Big, and Small is protected against both others.

So the Medium protection is triggered sooner because it has a higher space need compared to the Small volume, so Big is frozen before Medium is frozen.

So when space then runs out, first Big is frozen, and when that doesn't help, in time Medium is also frozen.

Seems pretty legit I must say.

And this could be completely unconfigured, just a standard recipe using for configuration only the percentage you want to use.

Ie. you can say I want 5% free on all volumes from the top down, and only the biggest one isn't protected, but all the smaller ones are.

If several are the same size you lump them together.

Now you have a cascading system in which if you choose this script, you will have "Small ones protected against Big ones" protection in which you really don't have to set anything up yourself.

You don't even have to flag them as critical...

Sounds like fun to make in any case.


g) There is a little program called "pam_shield" that uses "shield_triggers" to select which kind of behaviour the user wants to use in blocking external IPs. It provides several alternatives such as IP routing block (blackhole) and iptables block.

You can choose which intervention you want. The scripts are already provided. You just have to select the one you want.


And to ensure that this is default behaviour?

Why you think this should be default ?

Default is to auto-extend thin-data & thin-metadata when needed if you
set threshold bellow 100%.

Q: In a 100% filled up pool, are snapshots still going to be valid?

Could it be useful to have a default policy of dropping snapshots at high consumption? (ie. 99%). But it doesn't have to be default if you can easily configure it and the scripts are available.

So no, if the scripts are available and the system doesn't crash as you say it doesn't anymore, there does not need to be a default.

Just documented.

I've been condensing this email.

You could have a script like:

#!/bin/bash

# Assuming $1 is the thin pool I am getting executed on, that $2 is the threshold that
# has been reached, and $3 is the free space available in pool

MIN_FREE_SPACE_CRITICAL_VOLUMES_PCT=5

1. iterate critical volumes
2. calculate needed free space for those volumes based on above value
3. check against the free space in $3

4. perform action

Well I am not saying anything new here compared to Brassow Jonathan.

But it could be that simple to have a script you don't even need to configure.

More sophisticated then would be a big vs small script in which you don't even need to configure the critical volumes.

So to sum up my position is still:

a) Personally I would still prefer in-kernel protection based on quotas
b) Personally I would not want anything else from in-kernel protection
c) No other policies than that in the kernel
d) Just allocation block based on quotas based on lazy data collection

e) If people really use 64kB chunksizes and want max performance then it's not for them f) The analogy of the aeroplane that runs out of fuel and you have to choose which passengers to eject does not apply if you use quotas.

g) I would want more advanced policy or protection mechanisms (intervention) in userland using above ideas.

h) I would want inclusion of those basic default scripts in LVM upstream

i) The model of "shield_trigger" of "pam_shield" is a choice between several default interventions


We can discuss if it's good idea to enable auto-extending by default -
as we don't know if the free space in VG is meant to be used for
thin-pool or there is some other plan admin might have...

I don't think you should. Any admin that uses thin and that intends to auto-extend, will be able to configure so anyway.

When I said I wanted default, it is more like "available by default" than "configured by default".

Using thin is a pretty conscious choice.

As long as it is easy to activate protection measures, that is not an issue and does not need to be default imo.

Priorities for me:

1) Monitoring and reporting
2) System could block allocation for critical volumes
3) I can drop snapshots starting with the biggest one in case of <5% pool free
4) I can freeze volumes when space for critical volumes runs out

Okay sending this now. I tried to summarize.

See ya.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux