Re: Reserve space for specific thin logical volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dne 15.9.2017 v 09:34 Xen napsal(a):
Zdenek Kabelac schreef op 14-09-2017 21:05:


But if I do create snapshots (which I do every day) when the root and boot snapshots fill up (they are on regular lvm) they get dropped which is nice,

old snapshot are different technology for different purpose.

Again, what I was saying was to support the notion that having snapshots that may grow a lot can be a problem.


lvm2 makes them look the same - but underneath it's very different (and it's not just by age - but also for targeting different purpose).

- old-snaps are good for short-time small snapshots - when there is estimation for having low number of changes and it's not a big issue if snapshot is 'lost'.

- thin-snaps are ideal for long-time living objects with possibility to take snaps of snaps of snaps and you are guaranteed the snapshot will not 'just dissapear' while you modify your origin volume...

Both have very different resources requirements and performance...

I am not sure the purpose of non-thin vs. thin snapshots is all that different though.

They are both copy-on-write in a certain sense.

I think it is the same tool with different characteristics.

That are cases where it's quite valid option to take old-snap of thinLV and it will payoff...

Even exactly in the case you use thin and you want to make sure your temporary snapshot will not 'eat' all your thin-pool space and you want to let snapshot die.

Thin-pool still does not support shrinking - so if the thin-pool auto-grows to big size - there is not a way for lvm2 to reduce the thin-pool size...



That's just the sort of thing that in the past I have been keeping track of continuously (in unrelated stuff) such that every mutation also updated the metadata without having to recalculate it...

Would you prefer to spend all you RAM to keep all the mapping information for all the volumes and put very complex code into kernel to parse the information which is technically already out-of-data in the moment you get the result ??

In 99.9% of runtime you simply don't need this info.

But the purpose of what you're saying is that the number of uniquely owned blocks by any snapshot is not known at any one point in time.

As long as 'thinLV' (i.e. your snapshot thinLV) is NOT active - there is nothing in kernel maintaining its dataset. You can have lots of thinLV active and lots of other inactive.


Well pardon me for digging this deeply. It just seemed so alien that this thing wouldn't be possible.

I'd say it's very smart ;)

You can use only very small subset of 'metadata' information for individual volumes.

It becomes a rather big enterprise to install thinp for anyone!!!

It's enterprise level software ;)

Because to get it running takes no time at all!!! But to get it running well then implies huge investment.

In most common scenarios - user knows when he runs out-of-space - it will not be 'pleasant' experience - but users data should be safe.

And then it depends how much energy/time/money user wants to put into monitoring effort to minimize downtime.

As has been said - disk-space is quite cheap.
So if you monitor and insert your new disk-space in-time (enterprise...) you have less set of problems - then if you try to fight constantly with 100% full thin-pool...

You have still problems even when you have 'enough' disk-space ;)
i.e. you select small chunk-size and you want extend thin-pool data volume beyond addressable capacity - each chunk-size has its final maximum data size....

That means for me and for others that may not be doing it professionally or in a larger organisation, the benefit of spending all that time may not weigh up to the cost it has and the result is then that you keep stuck with a deeply suboptimal situation in which there is little or no reporting or fixing, all because the initial investment is too high.

You can always use normal device - it's really about the choice and purpose...



While personally I also like the bigger versus smaller idea because you don't have to configure it.

I'm still proposing to use different pools for different purposes...

Sometimes spreading the solution across existing logic is way easier,
then trying to achieve some super-inteligent universal one...

Script is called at  50% fullness, then when it crosses 55%, 60%, ...
95%, 100%. When it drops bellow threshold - you are called again once
the boundary is crossed...

How do you know when it is at 50% fullness?

If you are proud sponsor of your electricity provider and you like the
extra heating in your house - you can run this in loop of course...

Threshold are based on  mapped size for whole thin-pool.

Thin-pool surely knows all the time how many blocks are allocated and free for
its data and metadata devices.

But didn't you just say you needed to process up to 16GiB to know this information?

Of course thin-pool has to be aware how much free space it has.
And this you can somehow imagine as 'hidden' volume with FREE space...

So to give you this 'info' about free blocks in pool - you maintain very small metadata subset - you don't need to know about all other volumes...

If other volume is releasing or allocation chunks - your 'FREE space' gets updated....

It's complex underneath and locking is very performance sensitive - but for easy understanding you can possibly get the picture out of this...


You may not know the size and attribution of each device but you do know the overall size and availability?

Kernel support 1 setting for threshold - where the user-space (dmeventd) is waked-up when usage has passed it.

The mapping of value is lvm.conf autoextend threshold.

As a 'secondary' source - dmeventd checks every 10 second pool fullness with single ioctl() call and compares how the fullness has changed and provides you with callbacks for those 50,55... jumps
(as can be found in  'man dmeventd')

So for autoextend theshold passing you get instant call.
For all others there is up-to 10 second delay for discovery.

In the single thin-pool  all thins ARE equal.

But you could make them unequal ;-).

I cannot ;)  - I'm lvm2 coder -   dm thin-pool is Joe's/Mike's toy :)

In general - you can come with many different kernel modules which take different approach to the problem.

Worth to note - RH has now Permabit in its porfolio - so there can more then one type of thin-provisioning supported in lvm2...

Permabit solution has deduplication, compression, 4K blocks - but no snapshots....



The goal was more to protect the other volumes, supposing that log writing happened on another one, for that other log volume not to impact the other main volumes.

IMHO best protection is different pool for different thins...
You can more easily decide which pool can 'grow-up'
and which one should rather be taken offline.

So your 'less' important data volumes may simply hit the wall hard,
while your 'strategically important' one will avoid using overprovisioning as much as possible to keep it running.

Motto: keep it simple ;)

So you have thin global reservation of say 10GB.

Your log volume is overprovisioned and starts eating up the 20GB you have available and then runs into the condition that only 10GB remains.

The 10GB is a reservation maybe for your root volume. The system (scripts) (or whatever) recognises that less than 10GB remains, that you have claimed it for the root volume, and that the log volume is intruding upon that.

It then decides to freeze the log volume.

Of course you can play with 'fsfreeze' and other things - but all these things are very special to individual users with their individual preferences.

Effectively if you freeze your 'data' LV - as a reaction you may paralyze the rest of your system - unless you know the 'extra' information about the user use-pattern.

But do not take this as something to discourage you to try it - you may come with perfect solution for your particular system - and some other user may find it useful in some similar pattern...

It's just something that lvm2 can't give support globally.

But lvm2 will give you enough bricks for writing 'smart' scripts...

Okay.. I understand. I guess I was deluded a bit by non-thin snapshot behaviour (filled up really fast without me understanding why, and concluding that it was doing 4MB copies).

Fast disks are now easily able to write gigabytes in second... :)


But attribution of an extent to a snapshot will still be done in extent-sizes right?

Allocation unit in VG  is 'extent'   - ranges from 1sector to 4GiB
and default is 4M - yes....


So I don't think the problems of freezing are bigger than the problems of rebooting.

With 'reboot' you know where you are -  it's IMHO fair condition for this.

With frozen FS and paralyzed system and your 'fsfreeze' operation of unimportant volumes actually has even eaten the space from thin-pool which may possibly been used better to store data for important volumes.... and there is even big danger you will 'freeze' yourself already during call of fsfreeze (unless you of course put BIG margins around)



"System is still running but some applications may have crashed. You will need to unfreeze and restart in order to solve it, or reboot if necessary. But you can still log into SSH, so maybe you can do it remotely without a console ;-)".

Compare with  email:

Your system has run out-of-space, all actions to gain some more space has failed - going to reboot into some 'recovery' mode


So there is no issue with snapshots behaving differently. It's all the same and all committed data will be safe prior to the fillup and not change afterward.

Yes - snapshot is 'user-land' language  -  in kernel - all thins  maps chunks...

If you can't map new chunk - things is going to stop - and start to error things out shortly...

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/




[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux