Re: [LSF/MM TOPIC] Making pseudo file systems inodes/dentries more like normal file systems

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 29 Jan 2024 16:08:33 +0100

> But no. You should *not* look at a virtual filesystem as a guide how
> to write a filesystem, or how to use the VFS. Look at a real FS. A
> simple one, and preferably one that is built from the ground up to
> look like a POSIX one, so that you don't end up getting confused by
> all the nasty hacks to make it all look ok.
> 
> IOW, while FAT is a simple filesystem, don't look at that one, just
> because then you end up with all the complications that come from
> decades of non-UNIX filesystem history.
> 
> I'd say "look at minix or sysv filesystems", except those may be
> simple but they also end up being so legacy that they aren't good
> examples. You shouldn't use buffer-heads for anything new. But they
> are still probably good examples for one thing: if you want to
> understand the real power of dentries, look at either of the minix or
> sysv 'namei.c' files. Just *look* at how simple they are. Ignore the
> internal implementation of how a directory entry is then looked up on
> disk - because that's obviously filesystem-specific - and instead just
> look at the interface.

I agree and I have to say I'm getting annoyed with this thread.

And I want to fundamentally oppose the notion that it's too difficult to
write a virtual filesystem. Just one look at how many virtual
filesystems we already have and how many are proposed. Recent example is
that KVM wanted to implement restricted memory as a stacking layer on
top of tmpfs which I luckily caught early and told them not to do.

If at all a surprising amount of people that have nothing to do with
filesystems manage to write filesystem drivers quickly and propose them
upstream. And I hope people take a couple of months to write a decently
sized/complex (virtual) filesystem.

And specifically for virtual filesystems they often aren't alike at
all. And that's got nothing to do with the VFS abstractions. It's
simply because a virtual filesystem is often used for purposes when
developers think that they want a filesystem like userspace interface
but don't want all of the actual filesystem semantics that come with it.
So they all differ from each other and what functionality they actually
implement.

And I somewhat oppose the notion that the VFS isn't documented. We do
have extensive documentation for locking rules, a constantly updated
changelog with fundamental changes to all VFS APIs and expectations
around it. Including very intricate details for the reader that really
needs to know everything. I wrote a whole document just on permission
checking and idmappings when we added that to the VFS. Both
implementation and theoretical background. 

And stuff like overlayfs or shiftfs are completely separate stories
because they're even more special as they're (virtual) stacking
filesystems that challenge the VFS in way more radical ways than regular
virtual filesystems.

And I think (Amir may forgive me) that stacking filesystems are
generally an absolutely terrible idea as they complicate the VFS
massively and put us through an insane amount of pain. One just needs to
look at how much additional VFS machinery we have because of that and
how complicated our callchains can become because of that. It's just not
correct to even compare them to a boring virtual filesystem like
binderfs or bpffs.