Re: allowing for a completely cached umount(2) pathwalk

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Fri, 14 Apr 2023 15:30:30 +0000

> On Apr 14, 2023, at 11:13, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> 
> On Fri, Apr 14, 2023 at 02:21:00PM +0000, Trond Myklebust wrote:
>> 
>> 
>>> On Apr 14, 2023, at 09:41, Christian Brauner <brauner@xxxxxxxxxx> wrote:
>>> 
>>> On Fri, Apr 14, 2023 at 06:06:38AM -0400, Jeff Layton wrote:
>>>> On Fri, 2023-04-14 at 03:43 +0100, Al Viro wrote:
>>>>> On Fri, Apr 14, 2023 at 08:41:03AM +1000, NeilBrown wrote:
>>>>> 
>>>>>> The path name that appears in /proc/mounts is the key that must be used
>>>>>> to find and unmount a filesystem.  When you do that "find"ing you are
>>>>>> not looking up a name in a filesystem, you are looking up a key in the
>>>>>> mount table.
>>>>> 
>>>>> No.  The path name in /proc/mounts is *NOT* a key - it's a best-effort
>>>>> attempt to describe the mountpoint.  Pathname resolution does not work
>>>>> in terms of "the longest prefix is found and we handle the rest within
>>>>> that filesystem".
>>>>> 
>>>>>> We could, instead, create an api that is given a mount-id (first number
>>>>>> in /proc/self/mountinfo) and unmounts that.  Then /sbin/umount could
>>>>>> read /proc/self/mountinfo, find the mount-id, and unmount it - all
>>>>>> without ever doing path name lookup in the traditional sense.
>>>>>> 
>>>>>> But I prefer your suggestion.  LOOKUP_MOUNTPOINT could be renamed
>>>>>> LOOKUP_CACHED, and it only finds paths that are in the dcache, never
>>>>>> revalidates, at most performs simple permission checks based on cached
>>>>>> content.
>>>>> 
>>>>> umount /proc/self/fd/42/barf/something
>>>>> 
>>>> 
>>>> Does any of that involve talking to the server? I don't necessarily see
>>>> a problem with doing the above. If "something" is in cache then that
>>>> should still work.
>>>> 
>>>> The main idea here is that we want to avoid communicating with the
>>>> backing store during the umount(2) pathwalk.
>>>> 
>>>>> Discuss.
>>>>> 
>>>>> OTON, umount-by-mount-id is an interesting idea, but we'll need to decide
>>>>> what would the right permissions be for it.
>>>>> 
>>>>> But please, lose the "mount table is a mapping from path prefix to filesystem"
>>>>> notion - it really, really is not.  IIRC, there are systems that work that way,
>>>>> but it's nowhere near the semantics used by any Unices, all variants of Linux
>>>>> included.
>>>> 
>>>> I'm not opposed to something by umount-by-mount-id either. All of this
>>>> seems like something that should probably rely on CAP_SYS_ADMIN.
>>> 
>>> The permission model needs to account for the fact that mount ids are
>>> global and as such you could in principle unmount any mount in any mount
>>> namespace. IOW, you can circumvent lookup restrictions completely.
>>> 
>>> So we could resolve the mnt-id to an FMODE_PATH and then very roughly
>>> with no claim to solving everything:
>>> 
>>> may_umount_by_mnt_id(struct path *opath)
>>> {
>>> struct path root;
>>> bool reachable;
>>> 
>>> // caller in principle able to circumvent lookup restrictions
>>>       if (!may_cap_dac_readsearch())
>>> return false;
>>> 
>>> // caller can mount in their mountns
>>> if (!may_mount())
>>> return false;
>>> 
>>> // target mount and caller in the same mountns
>>> if (!check_mnt())
>>> return false;
>>> 
>>> // caller could in principle reach mount from it's root
>>> get_fs_root(current->fs, &root);
>>>       reachable = is_path_reachable(real_mount(opath->mnt), opath->dentry, &root);
>>> path_put(&root);
>>> 
>>> return reachable;
>>> }
>>> 
>>> However, that still means that we have laxer restrictions on unmounting
>>> by mount-id then on unmount with lookup as for lookup just having
>>> CAP_DAC_READ_SEARCH isn't enough. Usually - at least for filesytems
>>> without custom permission handlers - we also establish that the inode
>>> can be mapped into the caller's idmapping.
>>> 
>>> So that would mean that unmounting by mount-id would allow you to
>>> unmount mounts in cases where you wouldn't with umount. That might be
>>> fine though as that's ultimately the goal here in a way.
>>> 
>>> One could also see a very useful feature in this where you require
>>> capable(CAP_DAC_READ_SEARCH) and capable(CAP_SYS_ADMIN) and then allow
>>> unmounting any mount in the system by mount-id. This would obviously be
>>> very useful for privileged service managers but I haven't thought this
>>> Through.
>> 
>> That is exactly why having a separate syscall to do the lookup of the mount-id is good: it provides separation of privilege.
>> 
>> The conversion of mount-id to an O_PATH file descriptor is just akin to a path lookup, so only needs CAP_DAC_READ_SEARCH (since you require privilege only to bypass the ACL directory read and lookup restrictions). The resulting O_PATH file descriptor has no special properties that require any further privilege.
>> 
>> Then use that resulting file descriptor for the umount, which normally requires CAP_SYS_ADMIN.
> 
> There's a difference between unmounting directly by providing a mount id
> and getting an O_PATH file descriptor from a mnt-id. If you can simply
> unmount by mount-id it's useful for users that have CAP_DAC_READ_SEARCH
> in a container. Without it you likely need to require
> capable(CAP_DAC_READ_SEARCH) aka system level privileges just like
> open_to_handle_at() which makes this interface way less generic and
> usable. Otherwise you'd be able to get an O_PATH fd to something that
> you wouldn't be able to access through normal path lookup.

Being able to convert into an O_PATH descriptor gives you more options than just unmounting. It should allow you to syncfs() before unmounting. It should allow you to call open_tree() so you can manipulate the filesystem that is no longer accessible by path walk (e.g. so you can bind it elsewhere or move it).

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx