On Mon, Nov 04, 2024 at 05:41:18PM +0100, Miklos Szeredi wrote: > On Thu, 31 Oct 2024 at 11:30, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > One option would be to add a fsconfig() flag that enforces strict > > remount behavior if the filesystem supports it. So it's would become an > > opt-in thing. > > From what mount(8) does it seems the expected behavior of filesystems > is to reset the configuration state before parsing options in > reconfigure. But it's not what mount(8) expects on the command line. I'm not sure that's the case but I might misremember what mount(8) does. As best as I remember it the mount(2) system call has different behavior for VFS generic options, and filesystem specific mount options. The difficulty is once again that both are mixed together in the mount(2) system call which hides the behavioral differences. I'll try to summarize what I remember below. > I.e. "mount -oremount,ro" will result in all previous options being > added to the list of options (except rw). There's a big disconnect > between the two interfaces. So for VFS generic mount options the behavior of mount(2) is that if one has a filesystem mounted with nodev,nosuid,ro such as: mount(NULL, "/mnt", "tmpfs", 0, "nodev,nosuid"); and one now remounts (proper remount, not MS_BIND | MS_REMOUNT) as "ro": mount(NULL, "/mnt", "", MS_REMOUNT, "ro"); then mount(2) will not display additive behavior for the generic VFS mount options. Instead, it will treat it as a "reset". So "ro" gets added and "nosuid" and "nodev" get stripped. That non-additive behavior has actually caused quite some security issues. So mount(8) works around this, by translating a: mount -o remount,ro /mnt internally into: mount(NULL, "/mnt", "", MS_REMOUNT, "ro,nodev,nosuid"); But afair, this reset behavior only applies to generic VFS options (ro, nosuid, nodev, noexec) but not filesystem specific options during remount. In contrast, the problem with filesystem specific mount options during remount, is that quite a few filesystems ignore unknown mount options or mount options that cannot be changed on remount. This is effecitvely what overlayfs does when the remount request comes from the old mount(2) api. It will just consume anything and ignore even nonsensical/nonexistent mount options. This causes other problems where users that want to really ensure that a mount property gets changed during remount cannot be sure because anything will succeed. So initially I had thought we could change that behavior by differentiating between a request coming from the old or new mount api. If the request comes from the new mount api we would return errors for unknown- or mount options that cannot be changed on remount. However, this seems to break some tools such as mount(8) because it reassembles all mount options during remount to emulate additive behavior but then fails because a remount request from the new mount api rejects mount options that can't be changed during a remount. But there's definitely use-cases where userspace wants to know whether a mount option was actually change{d,able} during a remount. To accomodate the old and new behavior my idea had been to let the filesystem choose whether to ignore unknown mount options or whether it will error out if unknown mount options are specified or when mount options are specified that cannot be changed on remount. A filesystem that allows for strict mount option parsing could raise a flag in fs_flags and then a new uapi extension for fsopen() gets added, e.g., Something like: fsopen("overlayfs", FSOPEN_REMOUNT_STRICT); (Fwiw, mount_setattr() is additive/subtractive, i.e., it does the right thing and only clears or sets the options that are explicitly specified, leaving other options alone.) > > I guess your suggestion is to allow filesystem to process only the > options that are changed, right? > > I think that makes perfect sense and would allow to slowly get rid of > the above disconnect.