Re: [LSF/MM TOPIC] Making pseudo file systems inodes/dentries more like normal file systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2024-01-27 at 11:44 -0800, Linus Torvalds wrote:
[...]
>  (c) none of the above is generally true of virtual filesystems
> 
> Sure, *some* virtual filesystems are designed to act like a
> filesystem from the ground up. Something like "tmpfs" is obviously a
> virtual filesystem, but it's "virtual" only in the sense that it
> doesn't have much of a backing store. It's still designed primarily
> to *be* a filesystem, and the only operations that happen on it are
> filesystem operations.
> 
> So ignore 'tmpfs' here, and think about all the other virtual
> filesystems we have.

Actually, I did look at tmpfs and it did help.

> And realize that hey aren't really designed to be filesystems per se
> - they are literally designed to be something entirely different, and
> the filesystem interface is then only a secondary thing - it's a
> window into a strange non-filesystem world where normal filesystem
> operations don't even exist, even if sometimes there can be some kind
> of convoluted transformation for them.
> 
> So you have "simple" things like just plain read-only files in /proc,
> and desp[ite being about as simple as they come, they fail miserably
> at the most fundamental part of a file: you can't even 'stat()' them
> and get sane file size data from them.

Well, this is a big piece of the problem: when constructing a virtual
filesystem what properties do I really need to care about (like stat or
uniqueness of inode numbers) and what can I simply ignore?  Ideally
this should be documented because you have to read a lot of code to get
an idea of what the must have properties are.  I think a simple summary
of this would go a long way to getting people somewhat out of the swamp
that sucks you in when you try to construct virtual filesystems.

> And "caching" - which was the #1 reason for most of the filesystem
> code - ends up being much less so, although it turns out that it's
> still hugely important because of the abstraction interface it
> allows.
> 
> So all those dentries, and all the complicated lookup code, end up
> still being quite important to make the virtual filesystem look like
> a filesystem at all: it's what gives you the 'getcwd()' system call,
> it's what still gives you the whole bind mount thing, it really ends
> up giving a lot of "structure" to the virtual filesystem that would
> be an absolute nightmare without it.  But it's a structure that is
> really designed for something else.

I actually found dentries (which were the foundation of shiftfs) quite
easy.  My biggest problem was the places in the code where we use a
bare dentry and I needed the struct mnt (or struct path) as well, but
that's a different discussion.

> Because the non-filesystem virtual part that a virtual filesystem is
> actually trying to expose _as_ a filesystem to user space usually has
> lifetime rules (and other rules) that are *entirely* unrelated to any
> filesystem activity. A user can "chdir()" into a directory that
> describes a process, but the lifetime of that process is then
> entirely unrelated to that, and it can go away as a process, while
> the directory still has to virtually exist.

On this alone, real filesystems do have the unplug problem as well
(device goes away while user is in the directory), so the solution that
works for them work for virtual filesystems as well.

> That's part of what the VFS code gives a virtual filesystem: the
> dentries etc end up being those things that hang around even when the
> virtual part that they described may have disappeared. And you *need*
> that, just to get sane UNIX 'home directory' semantics.
> 
> I think people often don't think of how much that VFS infrastructure
> protects them from.
> 
> But it's also why virtual filesystems are generally a complete mess:
> you have these two pieces, and they are really doing two *COMPLETELY*
> different things.
> 
> It's why I told Steven so forcefully that tracefs must not mess
> around with VFS internals. A virtual filesystem either needs to be a
> "real filesystem" aka tmpfs and just leave it *all* to the VFS layer,
> or it needs to just treat the dentries as a separate cache that the
> virtual filesystem is *not* in charge of, and trust the VFS layer to
> do the filesystem parts.
> 
> But no. You should *not* look at a virtual filesystem as a guide how
> to write a filesystem, or how to use the VFS. Look at a real FS. A
> simple one, and preferably one that is built from the ground up to
> look like a POSIX one, so that you don't end up getting confused by
> all the nasty hacks to make it all look ok.

Well, I did look at ext4 when I was wondering what a real filesystem
does, but we're back to having to read real and virtual filesystems now
just to understand what you have to do and hence we're back to the "how
do we make this easier" problem.

> IOW, while FAT is a simple filesystem, don't look at that one, just
> because then you end up with all the complications that come from
> decades of non-UNIX filesystem history.
> 
> I'd say "look at minix or sysv filesystems", except those may be
> simple but they also end up being so legacy that they aren't good
> examples. You shouldn't use buffer-heads for anything new. But they
> are still probably good examples for one thing: if you want to
> understand the real power of dentries, look at either of the minix or
> sysv 'namei.c' files. Just *look* at how simple they are. Ignore the
> internal implementation of how a directory entry is then looked up on
> disk - because that's obviously filesystem-specific - and instead
> just look at the interface.

So shall I put you down for helping with virtual filesystem
documentation then ... ?

James





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux