Re: [RFC PATCH 2/3] add statmnt(2) syscall

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 20 Sep 2023 10:33:38 +1000

On Tue, Sep 19, 2023 at 02:50:28PM +0200, Christian Brauner wrote:
> On Mon, Sep 18, 2023 at 02:58:00PM -0600, Andreas Dilger wrote:
> > On Sep 18, 2023, at 7:51 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > 
> > > 
> > >> The type and subtype are naturally limited to sane sizes, those are
> > >> not an issue.
> > > 
> > > What's the limit for fstype actually? I don't think there is one.
> > > There's one by chance but not by design afaict?
> > > 
> > > Maybe crazy idea:
> > > That magic number thing that we do in include/uapi/linux/magic.h
> > > is there a good reason for this or why don't we just add a proper,
> > > simple enum:
> > > 
> > > enum {
> > > 	FS_TYPE_ADFS        1
> > > 	FS_TYPE_AFFS        2
> > > 	FS_TYPE_AFS         3
> > > 	FS_TYPE_AUTOFS      4
> > > 	FS_TYPE_EXT2	    5
> > > 	FS_TYPE_EXT3	    6
> > > 	FS_TYPE_EXT4	    7
> > > 	.
> > > 	.
> > > 	.
> > > 	FS_TYPE_MAX
> > > }
> > > 
> > > that we start returning from statmount(). We can still return both the
> > > old and the new fstype? It always felt a bit odd that fs developers to
> > > just select a magic number.
> > 
> > Yes, there is a very good reason that there isn't an enum for filesystem
> 
> I think this isn't all that relevant to the patchset so I'm not going to
> spend a lot of time on this discussion but I'm curious.
> 
> > type, which is because this API would be broken if it encounters any
> > filesystem that is not listed there.  Often a single filesystem driver in
> > the kernel will have multiple different magic numbers to handle versions,
> > endianness, etc.
> 
> Why isn't this a problem for magically chosen numbers?

What problem are you asking about? The 32 bit space that contains
a few hundred magic numbers remains a vast field of empty space that
makes collisions easy to avoid....

> > Having a 32-bit magic number allows decentralized development with low
> > chance of collision, and using new filesystems without having to patch
> > every kernel for this new API to work with that filesystem.  Also,
> 
> We don't care about out of tree filesystems.

In this case, we most certainly do care. Downstream distros support
all sorts of out of tree filesystems loaded via kernel modules, so a
syscall that is used to uniquely identify a filesystem type to
userspace *must* have a mechanism for the filesystem to provide that
unique identifier to userspace.

Fundamentally, the kernel does not and should not dictate what
filesystem types it supports; the user decides what filesystem they
need to use, and it is the kernel's job to provide infrastructure
that works with that user's choice.

Remember: it's not just applications that stat the mounted
filesystem that know about the filesystem amgic numbers.  Apps like
grub, libblkid, etc all look at filesystem magic numbers directly on
the block device to identify the type of filesystem that is on the
device.

If we introduce a new identifer specific to mounted kernel
filesystems, these sorts of apps now need to use two different
identifiers in different contexts instead of the same magic number
everywhere. That's not a win for anyone.

Magic numbers are also portable - it does not matter what OS you see
that FS on, it has the same unique, stable type identifier. You can
look at the block device and identify the filesystem by it's magic
number, you can stat the mounted filesystem and get the same magic
number. It just works the same *everywhere*.

Magic numbers have served the purpose of being unique filesystem
identifiers for over 40 years. They work just fine for this purpose
and nothing has changed in the past couple of decades that has
broken them or needs fixing.

> > filesystems come and go (though more slowly) over time, and keeping the
> 
> Even if we did ever remove a filesystem we'd obviously leave the enum in
> place. Same thig we do for deprecated flags, same thing we'd do for
> magic numbers.

So why try to replace magic numbers if we must replicate all the
same unique, stable behaviour that magic numbers already provide the
kernel and userspace with?

>
> > full list of every filesystem ever developed in the kernel enum would be
> > a headache.
> 
> I really don't follow this argument.

The kernel currently doesn't need to know about all the potential
fuse filesystem types that can be mounted. It doesn't need to know
about all the 3rd party filesystems that could be mounted. these all
just work and userspace can identify them just fine via their unique
magic numbers that are passed through the kernel interfaces from the
filesystem.

Then enum proposal breaks these existing working use cases unless
the enum explicitly includes ever possible filesystem type that the
kernel might expose to userspace. The kernel *should not care* what
filesystems it exposes to userspace and that's the whole point of using
a filesystem supplied magic number as the unique identifier for the
filesystem...

> > The field in the statmnt() call would need to be at a fixed-size 32-bit
> > value in any case, so having it return the existing magic will "just work"
> > because userspace tools already know and understand these magic values,
> > while introducing an in-kernel enum would be broken for multiple reasons.
> 
> We already do expose the magic number in statmount() but it can't
> differentiate between ext2, ext3, and ext4 for example which is why I
> asked.

That's just an extN quirk, and it's trivial to fix for the new
interface. Define new magic numbers for ext3 and ext4 and only use
them in the new interface, leave the old interfaces using the ext2
magic number for all of them.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx