Re: [PATCH 00/32] VFS: Introduce filesystem context [ver #8]

David Howells <dhowells@xxxxxxxxxx> · Mon, 18 Jun 2018 21:30:50 +0100

Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:

> I have read through these patches and I noticed a significant issue.
> 
> Today in mount_bdev we do something that looks like:
> 
> mount_bdev(...)
> {
> 	s = sget(..., bdev);
> 	if (s->s_root) {
>         	/* Noop */
>         } else {
>         	err = fill_super(s, ...);
>                 if (err) {
>                 	deactivate_locked_super(s);
>                         return ERR_PTR(err);
>                 }
>                 s->s_flags |= SB_ATTIVE;
>                 bdev->bd_super = s;
>         }
>         return dget(s->s_root);
> }
> 
> The key point is that we don't process the mount options at all if
> a super block already exists in the kernel.  Similar to what
> your fscontext changes are doing (after parsing the options).

Actually, no, that's not the case.

The fscontext code *requires* you to parse the parameters *before* any attempt
to access the superblock is made.  Note that this will actually be a problem
for, say, ext4 which passes a text string stored in the superblock through the
parser *before* parsing the mount syscall data.  Fun.

I'm intending to deal with that particular case by having ext4 create multiple
private contexts, one filled in from the user data, and then a second one
filled in from the superblock string.  These can then be validated one against
the other before the super_block struct is published.

And if the super_block struct already exists, the user's specified parameters
can be validated against that.

> Your fscontext changes do not improve upon this area of the mount api at
> all and that concerns me.  This is an area where people can and already
> do shoot themselves in their feet.

This *will* be dealt with, but I wanted to get the core changes upstream
before tackling all the filesystems.  The legacy wrapper is just that and
should be got rid of when all the filesystems have been converted.

> ...
>
> Creating a new mount and finding an old mount are the same operation in
> the kernel today.  This is fundamentally confusing.  In the new api
> could we please separate these two operations?
>
> Perhaps someting like:
> x create
> x find
> 
> With the "x create" case failing if the filesystem already exists,
> still allowing "x find"?  And with the "x find" case failing if
> the superblock is not already created in the kernel.

No.  What you're suggesting introduces a userspace-userspace and a
userspace-kernel race - unless you're willing to let userspace lock against
superblock creation by other parties.

Further, some filesystems *have* to be parameterised before you can do the
check for the superblock.  Network filesystems, for example, where you have to
set the network parameters upfront and the key to the superblock might not be
known until you've queried the server.

> That should make it clear to a userspace program what is going on
> and give it a chance to mount a filesystem anyway.

That said, I'm willing to provide a "fail if already extant" option if we
think that's actually likely to be of use.  However, you'd still have to
provide parameters before the check can be made.

> In a similar vein could we please clarify the rules for changing mount
> options for an existing superblock are in the new api?

You mean remount/reconfigure?  Note that we have to provide backward
compatibility with every single filesystem.

> Today mount assumes that it has to provide all of the existing options to
> reconfigure a mount.  What people want to do and what most filesystems
> support is just specifying the options that need to be changed.  Can we
> please make this the rule of how this are expected to work for fscontext?
> That only changing mount options need to be specified before: "x
> reconfigure"

Fine by me - but it must *also* support every option being specified if that
is what mount currently does.

I don't really want to supply extra parsers if I can avoid it.  Miklós, for
example wanted a different, incompatible interface, so you'd do:

	write(fd, "o +foo");
	write(fd, "o -bar");
	write(fd, "x reconfig");

sort of thing to enable or disable options... but this assumes that options
are binary and requires a separate parser to the one that does the initial
configuration - and you still have to support the old remount data parse.

I'm okay with specifying that you should just specify the options you want to
change and that the normal way to 'disable' something is to prefix it with
"no".

I guess I could pass a flag through to indicate that this came from
sys_mount(MS_REMOUNT) rather than the new method.

David