Re: thin handling of available space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Let me just write down some thoughts here.

First of all you say that fundamental OS design is about higher layers trusting lower layers and that certain types of communications should then always be one way.

In this case it is about block layer vs. file system layer.

But you make certain assumptions about the nature of a block device to begin with.

A block device is defined by its acess method (ie. data organized in blocks) rather than its contigiousness or having an unchanging, "single block" address or access space. I know this goes pretty far but it is the truth.

In theory there is nothing against a hypothetical block device offering ranges of blocks to a higher level (that might never change) or to be dynamically notified of changes to that address pool.

To a process virtual memory is a space that is transparent to it whether that space is constructed of paged memory (swap file) or not. At the same time it is not impossible to imagine that an IO scheduler for swap would take heed of values given by applications, such as using nice or ionice values. That would be one way communication though.

In general a higher level should be oblivious to what kind of lower level layer it is running on, you are right. Yet if all lower levels exhibit the same kind of features, this point becomes moot, because at that point the higher level will not be able to know, once more, precisely what kind of layer it is running on, although it would have more information.

So just theoretically speaking the only thing that is required to be consistent is the API or whatever interface you design for it.

I think there are many cases where some software can run on some libraries but not on others because those other libraries do not offer the full feature set of whatever standard is being defined there. An example is DLNA/UPNP, these are not layers but the standard is ill-defined and the device you are communicating with might not support the full set.

Perhaps these are detrimental issues but there are plenty of cases where one type of "lower level" will suffice but another won't, think maybe of graphics drivers. Across the layer boundary, communication is two-way anyway. The block device *does* supply endless streams of data to the higher layer. The only thing that would change is that you would no longer have this "always one contigious block of blocks" but something that is slightly more volatile.

When you "mkfs" the tool reads the size of the block device. Perhaps subsequently the filessytem is unaware and depends on fixed values.

The feature I described (use case) would allow the set of blocks that is available, to dynamically change. You are right that this would apparently be a big departure from the current model.

So I'm not saying it is easy, perfect, or well understood. I'm just saying I like the idea.

I don't know what other applications it might have but it depends entirely on correct "discard" behaviour from the filesystem.

The filesystem should be unaware of its underlying device but discard is never required for rotating disks as far as I can tell. This is an option that assumes knowledge of the underlying device. From discard we can basically infer that either we are dealing with a flash device or something that has some smartness about what blocks it retains and what not (think cache).

So in general this is already a change that reflects changing conditions of block devices in general or its availability. And its characteristic behaviour or demands from filesystems.

These are block devices that want more information to operate (well).

Coincidentally, discard also favours or enhances (possibly) lvmcache.

So it's not about doing something wildly strange here, it's about offering a feature set that a filesystem may or may not use, or a block device may or may not offer.

Contrary to what you say, there is nothing inherently bad about the idea. The OS design principle violation you speak of is principle, not practical reality. It's not that it can't be done. It's that you don't want it to happen because it violates your principles. It's not that it wouldn't work. It's that you don't like it to work because it violates your principles.

At the same time I object to the notion of the system administrator being this theoretical vastly differing role/person than the user/client.

We have no in-betweens on Linux. For fun you should do a search of your filesystem with find -xdev based on the contents of /etc/passwd or /etc/group. You will find that 99% of files are owned by root and the only ones that aren't are usually user files in the home directory or specific services in /var/lib.

Here is a script that would do it for groups:

cat /etc/group | cut -d: -f1 | while read g; do printf "%-15s %6d" $g `find / -xdev -type f -group $g | wc -l`; done

Probably. I can't run it here it might crash my system (live dvd).

Of about 170k files on an OpenSUSE system, 15 were group writable, mostly due to my own interference probably. Of 170197 files (no xdev) 168161 were owned by root.

Excluding man and my user, 69 files did not have "root" as the group. Part of that was again due to my own changes.

At the same time in some debates your are presented with the ludicrous notion that there is some ideal desktop user who doesn't need to ever see anything of the internal system. She never opens a shell and certainly does not come across ethernet device names (for example). The "desktop user" does not care about the naming of devices from /dev/eth0 to /sys/class/net/enp3s0.

The desktop user never uses anything other than DHCP, etc. etc. etc.

The desktop user never can configure anything without the help of the admin, if it is slightly more advanced.

It's that user vs. admin dichotomy that is never true on any desktop system and I will venture it is not even true on the systems I am a client of, because you often need to debate stuff with the vendor or ask for features, offer solutions, etc.

In a store you are a client. There are employees and clients, nothing else. At the same time I treat these girls as my neighbours because they work in the block I live in.

You get the idea. Roles can be shifty. A person can use multiple roles at the same time. He/she can be admin and user simulaneously.

Perhaps you are correct to state that the roles themselves should not be watered down, that clear delimitations are required.

In your other email you allude to me not ever having done an OS design course.

Offlist a friendly member suggested strongly I not use personal attacks in my communications here. But of course this is precisely what you are doing here, because as a matter of fact I did follow such a course.

I don't remember the book we used because apparently between my house mate and me we only had one exemplar and he ended up getting it because I was usually the one borrowing stuff from him.

At the same time university is way beyond my current reach (in living conditions) so it is just an unwarranted allusion that does not have anything to do with anything really.

Yes I think it was the dinosaur book:

Operating System Concepts by Silberschatz, Galvin and Gagne

Anyway, irrelevant here.

Another way (haven't tested) to 'signal' the FS as to the true state of the underlying storage is to have a sparse file that gets shrunk over time.

You do realize you are trying to find ways around the limitation you just imposed on yourself right?

The system admin decided it was a bright idea to use thin pools in the first place so he necessarily signed up to be liable for the hazards and risks that choice entails. It is not the job of the FS to bail his ass out.

I don't think thin pools are that risky or should be that risky. They do incur a management overhead compared to static filesystems because of adding that second layer you need to monitor. At the same time the burden of that can be lessened with tools.

As it stands I consider thin LVM the only reasonably way to snapshot a running system without dedicating specific space to it in advance. I could expect snaphotting to require stuff to be in the same volume group. Without LVM thin, snapshotting requires making at least some prior investment in having a snapshot device ready for you in the same VG, right?

Do not think btrfs and ZFS are without costs. You wrote:

Then you want an integrated block+fs implementation. See BTRFS and ZFS.
WAFL and friends.

But btrfs is not without complexity. It uses subvolumes that differ from distribution to distribution as each makes its own choice. It requires knowledge of more complicated tools and mechanics to do the simplest (or most meaningful) of tasks. Working with LVM is easier. I'm not saying LVM is perfect and....

Using snapshotting as a backup measure is something that seems risky to me at the first place because it is a "partition table" operation which really you shouldn't be doing on a consistent basis. So in other to effectively use it in the first place you require tools that handle the safeguards for you. Tools that make sure you are not making some command line mistake. Tools that simply guard against misuse.

Regular users are not fit for being btrfs admins either.

It is going to confuse the hell out of people seeing as that what their systems run on and if they are introduced to some of the complexity of it.

You say swallow your pride. It has not much to do with pride.

It has to do with ending up in a situation I don't like. That is then going to "hurt" me for the remainder of my days until I switch back or get rid of it.

I have seen NOTHING NOTHING NOTHING inspiring about btrfs.

Not having partition tables and sending volumes across space and time to other systems, is not really my cup of tea.

It is a vendor lock-in system and would result in other technologies being lesser developed.

I am not alone in this opinion either.

Btrfs feels like a form of illness to me. It is living in a forest with all deformed trees, instead of something lush and inspiring. If you've ever played World of Warcraft, the only thing that comes a bit close is the Felwood area ;-).

But I don't consider it beyond Plaguelands either.

Anyway.

I have felt like btrfs in my life. They have not been the happiest moments of my life ;-).

I will respond more in another mail, this is getting too long.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux