Re: allowing for a completely cached umount(2) pathwalk

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 14 Apr 2023 14:21:00 +0000

> On Apr 14, 2023, at 09:41, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> 
> On Fri, Apr 14, 2023 at 06:06:38AM -0400, Jeff Layton wrote:
>> On Fri, 2023-04-14 at 03:43 +0100, Al Viro wrote:
>>> On Fri, Apr 14, 2023 at 08:41:03AM +1000, NeilBrown wrote:
>>> 
>>>> The path name that appears in /proc/mounts is the key that must be used
>>>> to find and unmount a filesystem.  When you do that "find"ing you are
>>>> not looking up a name in a filesystem, you are looking up a key in the
>>>> mount table.
>>> 
>>> No.  The path name in /proc/mounts is *NOT* a key - it's a best-effort
>>> attempt to describe the mountpoint.  Pathname resolution does not work
>>> in terms of "the longest prefix is found and we handle the rest within
>>> that filesystem".
>>> 
>>>> We could, instead, create an api that is given a mount-id (first number
>>>> in /proc/self/mountinfo) and unmounts that.  Then /sbin/umount could
>>>> read /proc/self/mountinfo, find the mount-id, and unmount it - all
>>>> without ever doing path name lookup in the traditional sense.
>>>> 
>>>> But I prefer your suggestion.  LOOKUP_MOUNTPOINT could be renamed
>>>> LOOKUP_CACHED, and it only finds paths that are in the dcache, never
>>>> revalidates, at most performs simple permission checks based on cached
>>>> content.
>>> 
>>> umount /proc/self/fd/42/barf/something
>>> 
>> 
>> Does any of that involve talking to the server? I don't necessarily see
>> a problem with doing the above. If "something" is in cache then that
>> should still work.
>> 
>> The main idea here is that we want to avoid communicating with the
>> backing store during the umount(2) pathwalk.
>> 
>>> Discuss.
>>> 
>>> OTON, umount-by-mount-id is an interesting idea, but we'll need to decide
>>> what would the right permissions be for it.
>>> 
>>> But please, lose the "mount table is a mapping from path prefix to filesystem"
>>> notion - it really, really is not.  IIRC, there are systems that work that way,
>>> but it's nowhere near the semantics used by any Unices, all variants of Linux
>>> included.
>> 
>> I'm not opposed to something by umount-by-mount-id either. All of this
>> seems like something that should probably rely on CAP_SYS_ADMIN.
> 
> The permission model needs to account for the fact that mount ids are
> global and as such you could in principle unmount any mount in any mount
> namespace. IOW, you can circumvent lookup restrictions completely.
> 
> So we could resolve the mnt-id to an FMODE_PATH and then very roughly
> with no claim to solving everything:
> 
> may_umount_by_mnt_id(struct path *opath)
> {
> struct path root;
> bool reachable;
> 
> // caller in principle able to circumvent lookup restrictions
>        if (!may_cap_dac_readsearch())
> return false;
> 
> // caller can mount in their mountns
> if (!may_mount())
> return false;
> 
> // target mount and caller in the same mountns
> if (!check_mnt())
> return false;
> 
> // caller could in principle reach mount from it's root
> get_fs_root(current->fs, &root);
>        reachable = is_path_reachable(real_mount(opath->mnt), opath->dentry, &root);
> path_put(&root);
> 
> return reachable;
> }
> 
> However, that still means that we have laxer restrictions on unmounting
> by mount-id then on unmount with lookup as for lookup just having
> CAP_DAC_READ_SEARCH isn't enough. Usually - at least for filesytems
> without custom permission handlers - we also establish that the inode
> can be mapped into the caller's idmapping.
> 
> So that would mean that unmounting by mount-id would allow you to
> unmount mounts in cases where you wouldn't with umount. That might be
> fine though as that's ultimately the goal here in a way.
> 
> One could also see a very useful feature in this where you require
> capable(CAP_DAC_READ_SEARCH) and capable(CAP_SYS_ADMIN) and then allow
> unmounting any mount in the system by mount-id. This would obviously be
> very useful for privileged service managers but I haven't thought this
> Through.

That is exactly why having a separate syscall to do the lookup of the mount-id is good: it provides separation of privilege.

The conversion of mount-id to an O_PATH file descriptor is just akin to a path lookup, so only needs CAP_DAC_READ_SEARCH (since you require privilege only to bypass the ACL directory read and lookup restrictions). The resulting O_PATH file descriptor has no special properties that require any further privilege.

Then use that resulting file descriptor for the umount, which normally requires CAP_SYS_ADMIN.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx