Re: Reserve space for specific thin logical volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zdenek Kabelac schreef op 12-09-2017 16:37:

On block layer - there are many things  black & white....

If you don't know which process 'create' written page, nor if you write
i.e. filesystem data or metadata or any other sort of 'metadata' information,
you can hardly do any 'smartness' logic on thin block level side.

You can give any example to say that something is black and white somewhere, but I made a general point there, nothing specific.

The philosophy with DM device is - you can replace then online with
something else - i.e. you could have a linear LV  which is turned to
'RAID" and than it could be turned to   'Cache RAID'  and then even to
thinLV -  all in one raw
on life running system.

I know.

So what filesystem should be doing in this case ?

I believe in most of these systems you cite the default extent size is still 4MB, or am I mistaken?

Should be doing complex question of block-layer underneath - checking
current device properties - and waiting till the IO operation is
processed  - before next IO comes in the process - and repeat the
some  in very synchronous
slow logic ??    Can you imagine how slow this would become ?

You mean a synchronous way of checking available space in thin volume by thin pool manager?

We are targeting 'generic' usage not a specialized case - which fits 1
user out of 1000000 - and every other user needs something 'slightly'
different....

That is completely exaggerative.

I think you will find this issue comes up often enough to think that it is not one out of 1000000 and besides unless performance considerations are at the heart of your ...reluctance ;-) no one stands to lose anything.

So only question is design limitations or architectural considerations (performance), not whether it is a wanted feature or not (it is).


I don't think there is anything related...
Thin chunk-size ranges from 64KiB to 1GiB....

Thin allocation is not by default in extent-sizes?

The only inter-operation is the main filesystem (like extX & XFS) are
getting fixed for better reactions for ENOSPC...
and WAY better behavior when there are 'write-errors' - surprisingly
there were numerous faulty logic and expectation encoded in them...

Well that's good right. But I did read here earlier about work between ExtFS team and LVM team to improve allocation characteristics to better align with underlying block boundaries.

If zpools - are 'equally' fast as thins - and gives you better protection,
and more sane logic the why is still anyone using thins???

I don't know. I don't like ZFS. Precisely because it is a 'monolith' system that aims to be everything. Makes it more complex and harder to understand, harder to get into, etc.

Of course if you slow down speed of thin-pool and add way more
synchronization points and consume 10x more memory :) you can get
better behavior in those exceptional cases which are only hit by
unexperienced users who tends to intentionally use thin-pools in
incorrect way.....

I'm glad you like us ;-).

Yes apologies here, I responded to this thing earlier (perhaps a year ago) and the systems I was testing on was 4.4 kernel. So I cannot currently confirm and probably is already solved (could be right).

Back then the crash was kernel messages on TTY and then after some 20-30

there is by default 60sec freeze, before unresized thin-pool start to reject
all write to unprovisioned space as 'error' and switches to
out-of-space state.  There is though a difference if you are
out-of-space in data
or metadata -  the later one is more complex...

I can't say whether it was that or not. I am pretty sure the entire system froze for longer than 60 seconds.

In page cache there are no thing logically separated - you have 'dirty' pages
you need to write somewhere - and if you writes leads to errors,
and system reads errors back instead of real-data - and your execution
code start to run on completely unpredictable data-set - well 'clean'
reboot is still very nice outcome IMHO....

Well even if that means some dirty pages are lost before the application discovers it, any read or write errors should at some point lead to the application to shut down right.

I think for most applications the most sane behaviour would simply be to shut down.

Unless there is more sophisticated error handling.

I am not sure what we are arguing about at this point.

Application needs to go anyway.


If I had a system crashing because I wrote to some USB device that was malfunctioning, that would not be a good thing either.

Well try to BOOT from USB :) and detach and then compare...
Mounting user data and running user-space tools out of USB is uncomparable...

Systems would also grind to a halt from user-data and not system files.

I know booting from USB can be 1000x slower than user data.

But shared page cache for all devices is bad design, period.

AFAIK - this is still not resolved issue...

That's a shame.

You can have different pools and you can use rootfs  with thins to
easily test i.e. system upgrades....

Sure but in the past GRUB2 would not work well with thin, I was basing myself on that...

/boot   cannot be on thin

/rootfs is not a problem - there will be even some great enhancement for Grub
to support this more easily and switching between various snapshots...

That's great, like with BTRFS I guess that this is possible?

But /rootfs was a problem. Grub-probe reported that it could not find the rootfs.

When I ran with custom grub config it worked fine. It was only grub-probe that failed, nothing else (Kubuntu 16.04).

EVERYONE would benefit.

Fortunately most users NEVER need it ;)

You're wrong. The assurance of a system not crashing (for instance) or some sane behaviour in case of fill-up, will put many minds at ease.

Since they properly operate thin-pool and understand it's weak points....

Yes they are all superhumans right.

I am sorry for being so inferior ;-).


Not necessarily that the system continues in full operation, applications are allowed to crash or whatever. Just that system does not lock up.

When you get bad data from your block device - your system's reaction
is unpredictable -  if your /rootfs cannot store its metadata - the
most sane behavior is to stop - all other solutions are so complex and
complicated, that spending resources to avoid hitting this state are
way better spent effort...

About rootfs, I agree.

But the nominal distinction was between thin-as-system and thin-as-data.

If you say that thin-as-data is specific use case that cannot be tailored for, that is a bit odd. It is still 90% of use.

Once again -  USE different pool - solve problems at proper level....
Do not over-provision critical volumes...

Again what we want is a valid use case and a valid request.

If the system is designed so badly (or designed in such a way) that it cannot be achieved, that does not immediately make it a bad wish.

For example if a problem is caused by the page-cache of the kernel being for all block devices at once, then anyone wanting something that is impossible because of that system...

...does not make that person bad for wanting it.

It makes the kernel bad for not achieving it.


I am sure your programmers are good enough to achieve asynchronous state-updating for a thin-pool that does not interfere with allocation to the extent that it will lazily update stats and which point allocation constraints might be basing themselves on older data (maybe seconds old) but that still doesn't mean it is useless.

It doesn't have to be perfect.

If my "critical volume" wants 1000 free extents, but it only has 988, that is not so great a problem.

Of course, I know, I hear you say "Use a different pool".

The whole idea for thin is resource efficiency.

There is no real reason that this "space reservation" can't happen.

Even if due to current design limitations, that might be there for a good reason, you are the arbiter on that.

It cannot be perfect or has to happen asynchronously.

It is better if non-critical volume starts failing than critical volume.

Failure is imminent, but we can choose which fails first.




I mean your argument is no different from.

"We need better man pages."

"REAL system administrators can use current man pages just fine."

"But any improvement would also benefit them, no need for them to do hard stuff when it can be easier."

"Since REAL system administrators can do their job as it is, our priorities lie elsewhere."

It's a stupid argument.

Any investment in user friendliness pays off for everyone.

Linux is often so impossible to use because no one makes that investment, even though it would have immeasurable benefits for everyone.

And then when someone does make the effort (e.g. makefile that displays help screen when run with no arguments) someone complains that it breaks the contract that "make" should start compiling instantly, thus using "status quo" as a way to never improve anything.

In this case, make "help screen" can save people litterally hours of time, multiplied by a 1000 people at least.


I.e. filesystem may guess about thin layout underneath and just write 1 byte to each block it wants to allocate.

:) so how do you resolve error paths -  i.e. how do you restore space
you have not actually used....
There are so many problems with this you can't even imagine...
Yeah - we've spent quite some time in past analyzing those paths....

In this case it seems that if this is possible for regular files (and directories in that sense) it should also be possible for "magic" files and directories that only exist to allocate some space somewhere. In any case it is FS issue, not LVM.

Besides, you only strengthen my argument that it isn't FS that should be doing it.


Please finally stop thinking about  some 'reserved' storage for
critical volume. It leads to nowhere....

It leads to you trying to convince me it isn't possible.

But no matter how much you try to dissuade, it is still an acceptable use case and desire.



Do the right action at right place.

For critical volume  use  non-overprovisiong pools - there is nothing
better you can do - seriously!

For Gionatan's use case the problem was poor performance of non-overprovisioning system.



Maybe start to understand how kernel works in practice ;)

Or how it doesn't work ;-).

Like,

I will give stupid example.

Suppose using a pen is illegal.

Now lots of people want to use pen, but they end up in jail.

Now you say "Wanting to use pen is bad desire, because of consequences".

But it's pretty clear the desire won't go away.

And the real solution needs to be had at changing the law.


In this case, people really want something and for good reasons. If there are structural reasons that it cannot be achieved, that is just that.

That doesn't mean the desires are bad.



You can forever keep saying "Do this instead" but that still doesn't ever make the prime desires bad.

"Don't use a pen, use a pencil. Problem solved."

Doesn't make wanting to use a pen a bad desire, nor does it make wanting some safe space in provisioning a bad desire ;-).


Otherwise you spend you live boring developers with ideas which simply
cannot work...

Or maybe changing their mind, who knows ;-).


So use 2 different POOLS, problem solved....

Was not possible for Gionatan's use case.

Myself I do not use critical volume, but I can imagine still wanting some space efficiency even when "criticalness" from one volume to the next differs.




It is proper desire Zdenek. Even if LVM can't do it.


Well it's always about checking 'upstream' first and then bothering
your upstream maintainer...

If you knew about the pre-existing problems, you could have informed me.

In fact it has happened that you said something cannot be done, and then someone else said "Yes, this has been a problem, we have been working on it and problems should be resolved now in this version".



You spend most of your time denying that something is wrong.

And then someone else says "Yes, this has been an issue, it is resolved now".

If you communicate more clearly then you also have less people bugging you.

We really cannot be solving problems of every possible deployed
combination of software.

The issue is more that at some point this was the main released version.

Main released kernel and main released LVM, in a certain sense.


Some of your colleagues are a little more forthcoming with acknowledgements that something has been failing.

This would considerably cut down the amount of time you spend being "bored" because you try to fight people who are trying to tell you something.

If you say "Oh yes, I think you mean this and that, yes that's a problem and we are working on it" or "Yes, that was the case before, this version fixes that" then


these long discussions also do not need to happen.

But you almost never say "Yes it's a problem", Zdenek.

That's why we always have these debates ;-).

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/




[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux