Re: Reserve space for specific thin logical volumes

Zdenek Kabelac <zdenek.kabelac@gmail.com> · Mon, 11 Sep 2017 19:34:18 +0200

Dne 11.9.2017 v 16:00 Xen napsal(a):
Just responding to second part of your email.

Only manual intervention this one... and last resort only to prevent crash 
so not really useful in general situation?

Let's simplify it for the case:

You have  1G thin-pool
You use 10G of thinLV on top of 1G thin-pool

And you ask for 'sane' behavior ??

Why not? Really.

Because all filesystems put on top of thinLV  do believe all blocks on the 
device actually exist....

Any idea of having 'reserved' space for 'prioritized' applications and
other crazy ideas leads to nowhere.

It already exists in Linux filesystems since long time (root user).

Did I say you can't compare filesystem problem with block level problem ?
If not ;) let's repeat - being out of space in a single filesystem
is completely different fairy-tail with out of space thin-pool.

Actually there is very good link to read about:

https://lwn.net/Articles/104185/

That was cute.

But we're not asking aeroplane to keep flying.
IMHO you just don't yet see the parallelism....

And we believe it's fine to solve exceptional case  by reboot.

Well it's hard to disagree with that but for me it might take weeks before I 
discover the system is offline.

IMHO it's problem of proper monitoring.

Still the same song here - you should actively trying to avoid car-collision, 
since trying to resurrect often seriously injured or even dead passenger from 
a demolished car is usually very complex job with unpredictable result...

We do put number of 'car-protection' safety mechanism - so the newer tools,
newer kernel the better -  but still when you hit the wall in top-speed
you can't expect you just 'walk-out' easily... and it's way cheaper to solve 
the problem in way you will NOT crash at all..

Otherwise most services would probably continue.

So now I need to install remote monitoring that checks the system is still up 
and running etc.

Of course you do.

thin-pool needs attention/care :)

If all solutions require more and more and more and more monitoring, that's 
not good.

It's the best we can provide....

So don't expect lvm2 team will be solving this - there are more prio work....

Sure, whatever.

Safety is never prio right ;-).

We are safe enough (IMHO) to NOT loose committed data,
We cannot guarantee stable system though - it's too complex.
lvm2/dm can't be fixing extX/btrfs/XFS and other kernel related issues...
Bold men can step in - and fix it....

If the system volume IS that important - don't use it with over-provisiong!

System-volume is not overprovisioned.

If you have  enough blocks in thin-pool to cover all needed block for all 
thinLV attached to it - you are not overprovisioning.

Just something else running in the system....

Use different pools ;)
(i.e. 10G system + 3 snaps needs  40G of data size & appropriate metadata size 
to be safe from overprovisioning)

That will crash the ENTIRE SYSTEM when it fills up.

Even if it was not used by ANY APPLICATION WHATSOEVER!!!

Full thin-pool on recent kernel is certainly NOT randomly crashing entire 
system :)

If you think it's that case - provide full trace of crashed kernel and open BZ 
- just be sure you use upstream Linux...

My system LV is not even ON a thin pool.

Again - if you reproduce on kernel 4.13 - open BZ and provide reproducer.
If you use older kernel - take a recent one and reproduce.

If you can't reproduce - problem has been already fixed.
It's then for your kernel provider to either back-port fix
or give you fixed newer kernel - nothing really for lvm2...

It's way more practical solution the trying to fix  OOM problem :)

Aye but in that case no one can tell you to ensure you have auto-expandable 
memory ;-) ;-) ;-) :p :p :p.

I'd probably recommend reading some books about how is memory mapped on a 
block device and what are all the constrains and related problems..

Yes email monitoring would be most important I think for most people.
Put mail messaging into  plugin script then.
Or use any monitoring software for messages in syslog - this worked
pretty well 20 years back - and hopefully still works well :)

Yeah I guess but I do not have all this knowledge myself about all these 
different kinds of softwares and how they work, I hoped that thin LVM would 
work for me without excessive need for knowledge of many different kinds.

We do provide some 'generic' script - unfortunately - every use-case is 
basically pretty different set of rules and constrains.

So the best we have is 'auto-extension'
We used to trying to umount - but this has possibly added more problems then 
it has actually solved...

I am just asking whether or not there is a clear design limitation that 
would ever prevent safety in operation when 100% full (by accident).

Don't user over-provisioning in case you don't want to see failure.

That's no answer to that question.

There is a lot of technical complexity behind it.....

I'd say the main part is -  'fs'  would need to be able to know understand
it's living on provisioned device (something we actually do not want to,
as you can change 'state' in runtime - so 'fs' should be aware & unaware
at the same time ;) -   checking with every request that thin-provisioning
is in the place would impact performance, doing in mount-time make it
also bad.

Then you need to deal with fact, that writes to filesystem are 'process' 
aware, while writes to block-device are some anonymous page writes for your 
page cache.
Have I said the level of problems for a single filesystem is totally different 
story yet ?

So in a simple statement  - thin-p has it's limits - if you are unhappy with 
them, then you probably need to look for some other solution - or starting
sending patches and improve things around...

It's the same as you should not overcommit your RAM in case you do not
want to see OOM....

But with RAM I'm sure you can typically see how much you have and can thus 
take account of that, filesystem will report wrong figure ;-).

Unfortunately you cannot....

Number of your free RAM is very fictional number ;) and you run in much bigger 
problems if you start overcommiting memory in kernel....

You can't compare your user-space failing malloc and OOM crashing Firefox....

Block device runs in-kernel - and as root...
There are no reserves, all you know is you need to write block XY,
you have no idea what is the block about..
(That's where ZFS/Btrfs was supposed to excel - they KNOW.... :)

Regard

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/