thin handling of available space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

So here is my question. I was talking about it with someone, who also didn't know.



There seems to be a reason against creating a combined V-size that exceeds the total L-size of the thin-pool. I mean that's amazing if you want extra space to create more volumes at will, but at the same time having a larger sum V-size is also an important use case.

Is there any way that user tools could ever be allowed to know about the real effective free space on these volumes?

My thinking goes like this:

- if LVM knows about allocated blocks then it should also be aware of blocks that have been freed.
- so it needs to receive some communication from the filesystem
- that means the filesystem really maintains a "claim" on used blocks, or at least notifies the underlying layer of its mutations.

- in that case a reverse communication could also exist where the block device communicates to the file system about the availability of individual blocks (such as might happen with bad sectors) or even the total amount of free blocks. That means the disk/volume manager (driver) could or would maintain a mapping or table of its own blocks. Something that needs to be persistent.

That means the question becomes this:

- is it either possible (theoretically) that LVM communicates to the filesystem about the real number of free blocks that could be used by the filesystem to make "educated decisions" about the real availability of data/space?

- or, is it possible (theoretically) that LVM communicates a "crafted" map of available blocks in which a certain (algorithmically determined) group of blocks would be considered "unavailable" due to actual real space restrictions in the thin pool? This would seem very suboptimal but would have the same effect.

See if the filesystem thinks it has 6GB available but really there is only 3GB because data is filling up, does it currently get notified of this?

What happens if it does fill up?

Funny that we are using GB in this example. I remembered today using Stacker on MS-DOS disk where I had 20MB available and was able to increase it to 30MB ;-).

Someone else might use terabytes, but anyway.

If the filesystem normally has a fixed size and this size doesn't change after creation (without modifying the filesystem) then it is going to calculate its free space based on its knowledge of available blocks.

So there are three figures:

- total available space
- real available space
- data taken up by files.

total - data is not always real, because there may still be handles on deleted files, etc., open. Visible, countable files and its "du" + blocks still in use + available blocks should be ~ total blocks.

So we are only talking about blocks here, nothing else.

And if LVM can communicate about availability of blocks, a fourth figure comes into play:

total = used blocks + unused blocks + unavailable blocks.

If LVM were able to dynamically adjust this last figure, we might have a filesystem that truthfully reports actual available space. In a thin setting.

I do not even know whether this is not already the case, but I read something that indicated an importance of "monitoring available space" which would make the whole situation unusable for an ordinary user.

Then you would need GUI applets that said "The space on your thin volume is running out (but the filesystem might not report it)".

So question is:

* is this currently 'provisioned' for?
* is this theoretically possible, if not?

If you take it to a tool such as "df"

There are only three figures and they add up.

They are:

total = used + available

but we want

total = used + available + unavailable

either that or the total must be dynamically be adjusted, but I think this is not a good solution.


So another question:

*SHOULDN'T THIS simply be a feature of any filesystem?*

The provision of being able to know about the *real* number of blocks in case an underlying block device might not be "fixed, stable, and unchanging"?

The way it is you can "tell" Linux filesystems with fsck which blocks are bad blocks and thus unavailable, probably reducing the number of "total" blocks.

From a user interface perspective, perhaps this would be an ideal solution, if you needed any solution at all. Personally I would probably prefer either the total space to be "hard limited" by the underlying (LVM) system, or for df to show a different output, but df output is often parsed by scripts.

In the former case supposing a volume was filling up.

udev             1974288       0   1974288   0% /dev
tmpfs             404384   41920    362464  11% /run
/dev/sr2         1485120 1485120         0 100% /cdrom

(Just taking 3 random filesystems)

One filesystem would see "used" space go up. The other two would see "total" size going down, in addition to the other one, also seeing that figure go down. That would be counterintuitive and you cannot really do this.

It's impossible to give this information to the user in a way that the numbers still add up.

Supposing:

real size 2000

1000  500  500
1000  500  500
1000  500  500

combined virtual size 3000. Total usage 1500. Real free 500. Now the first volume uses another 250.

1000  750  250
1000  500  250
1000  500  250

The numbers no longer add up for the 2nd and 3rd system.

You *can* adjust total in a way that it still makes sense (a bit)

1000  750  250
 750  500  250
 750  500  250

You can also just ignore the discrepancy, or add another figure:

total used unav avail
1000  750    0  250
1000  500  250  250
1000  500  250  250

Whatever you do, you would have to simply calculate this adjusted number from the real number of available blocks.

Now the third volume takes another 100

First style:

1000  750  150
1000  500  150
1000  600  150

Second style:

1000  750  150
 650  500  150
 750  600  150

Third style:

total used unav avail
1000  750  100  150
1000  500  350  150
1000  600  250  150

There's nothing technically inconsistent about it, it is just rather difficult to grasp at first glance.

df uses filesystem data, but we are really talking about block-layer-level-data now.

You would either need to communicate the number of available blocks (but which ones?) and let the filesystem calculate unavailable --- or communicate the number of unavailable blocks at which point you just do this calculation yourself. For each volume you reach a different number of "blocks" you need to withhold.

If you needed to make those blocks unavailable, you would now randomly (or at the end of the volume, or any other method) need to "unavail" those to the filesystem layer beneath (or above).

Every write that filled up more blocks would be communicated to you, (since you receive the write or the allocation) and would result in an immediate return of "spurious" mutations or an updated number of unavailable blocks -- and you can also communicate both.

On every new allocation, the filesystem would be returned blocks that you have "fakely" marked as unavailable. All of this only happens if available real space becomes less than that of the individual volumes (virtual size). The virtual "available" minus the "real available" is the number of blocks (extents) you are going to communicate as being "not there".

At every mutation from the filesystem, you respond with a like mutation: not to the filesystem that did the mutation, but to every other filesystem on every other volume.

Space being freed (deallocated) then means a reverse communication to all those other filesystems/volumes.

But it would work, if this was possible. This is the entire algorithm.


I'm sorry if this sounds like a lot of "talk" and very little "doing" and I am annoyed by that as well. Sorry about that. I wish I could actually be active with any of these things.

I am reminded of my father. He was in school for being a car mechanic but he had a scooter accident days before having to do his exam. They did the exam with him in a (hospital) bed. He only needed to give directions on what needed to be done and someone else did it for him :p.

That's how he passed his exam. It feels the same way for me.

Regards.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux