Re: lots of fstests cases fail on overlay with util-linux 2.40.2 (new mount APIs)

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 11 Nov 2024 13:30:50 +0100

On Mon, Nov 04, 2024 at 05:41:18PM +0100, Miklos Szeredi wrote:
> On Thu, 31 Oct 2024 at 11:30, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> 
> > One option would be to add a fsconfig() flag that enforces strict
> > remount behavior if the filesystem supports it. So it's would become an
> > opt-in thing.
> 
> From what mount(8) does it seems the expected behavior of filesystems
> is to reset the configuration state before parsing options in
> reconfigure.   But it's not what mount(8) expects on the command line.

I'm not sure that's the case but I might misremember what mount(8) does.

As best as I remember it the mount(2) system call has different behavior
for VFS generic options, and filesystem specific mount options.

The difficulty is once again that both are mixed together in the
mount(2) system call which hides the behavioral differences. I'll try to
summarize what I remember below. 

> I.e. "mount -oremount,ro" will result in all previous options being
> added to the list of options (except rw).  There's a big disconnect
> between the two interfaces.

So for VFS generic mount options the behavior of mount(2) is that if one
has a filesystem mounted with nodev,nosuid,ro such as:

	mount(NULL, "/mnt", "tmpfs", 0, "nodev,nosuid");

and one now remounts (proper remount, not MS_BIND | MS_REMOUNT) as "ro":

	mount(NULL, "/mnt", "", MS_REMOUNT, "ro");

then mount(2) will not display additive behavior for the generic VFS
mount options. Instead, it will treat it as a "reset". So "ro" gets
added and "nosuid" and "nodev" get stripped.

That non-additive behavior has actually caused quite some security
issues. So mount(8) works around this, by translating a:

	mount -o remount,ro /mnt

internally into:

	mount(NULL, "/mnt", "", MS_REMOUNT, "ro,nodev,nosuid");

But afair, this reset behavior only applies to generic VFS options (ro,
nosuid, nodev, noexec) but not filesystem specific options during
remount.

In contrast, the problem with filesystem specific mount options during
remount, is that quite a few filesystems ignore unknown mount options or
mount options that cannot be changed on remount.

This is effecitvely what overlayfs does when the remount request comes
from the old mount(2) api. It will just consume anything and ignore even
nonsensical/nonexistent mount options.

This causes other problems where users that want to really ensure that a
mount property gets changed during remount cannot be sure because
anything will succeed.

So initially I had thought we could change that behavior by
differentiating between a request coming from the old or new mount api.
If the request comes from the new mount api we would return errors for
unknown- or mount options that cannot be changed on remount.

However, this seems to break some tools such as mount(8) because it
reassembles all mount options during remount to emulate additive
behavior but then fails because a remount request from the new mount api
rejects mount options that can't be changed during a remount.

But there's definitely use-cases where userspace wants to know whether a
mount option was actually change{d,able} during a remount.

To accomodate the old and new behavior my idea had been to let the
filesystem choose whether to ignore unknown mount options or whether it
will error out if unknown mount options are specified or when mount
options are specified that cannot be changed on remount.

A filesystem that allows for strict mount option parsing could raise a
flag in fs_flags and then a new uapi extension for fsopen() gets added,
e.g., Something like:

        fsopen("overlayfs", FSOPEN_REMOUNT_STRICT);

(Fwiw, mount_setattr() is additive/subtractive, i.e., it does the
right thing and only clears or sets the options that are explicitly
specified, leaving other options alone.)

> 
> I guess your suggestion is to allow filesystem to process only the
> options that are changed, right?
> 
> I think that makes perfect sense and would allow to slowly get rid of
> the above disconnect.