> On Apr 14, 2023, at 11:13, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Fri, Apr 14, 2023 at 02:21:00PM +0000, Trond Myklebust wrote: >> >> >>> On Apr 14, 2023, at 09:41, Christian Brauner <brauner@xxxxxxxxxx> wrote: >>> >>> On Fri, Apr 14, 2023 at 06:06:38AM -0400, Jeff Layton wrote: >>>> On Fri, 2023-04-14 at 03:43 +0100, Al Viro wrote: >>>>> On Fri, Apr 14, 2023 at 08:41:03AM +1000, NeilBrown wrote: >>>>> >>>>>> The path name that appears in /proc/mounts is the key that must be used >>>>>> to find and unmount a filesystem. When you do that "find"ing you are >>>>>> not looking up a name in a filesystem, you are looking up a key in the >>>>>> mount table. >>>>> >>>>> No. The path name in /proc/mounts is *NOT* a key - it's a best-effort >>>>> attempt to describe the mountpoint. Pathname resolution does not work >>>>> in terms of "the longest prefix is found and we handle the rest within >>>>> that filesystem". >>>>> >>>>>> We could, instead, create an api that is given a mount-id (first number >>>>>> in /proc/self/mountinfo) and unmounts that. Then /sbin/umount could >>>>>> read /proc/self/mountinfo, find the mount-id, and unmount it - all >>>>>> without ever doing path name lookup in the traditional sense. >>>>>> >>>>>> But I prefer your suggestion. LOOKUP_MOUNTPOINT could be renamed >>>>>> LOOKUP_CACHED, and it only finds paths that are in the dcache, never >>>>>> revalidates, at most performs simple permission checks based on cached >>>>>> content. >>>>> >>>>> umount /proc/self/fd/42/barf/something >>>>> >>>> >>>> Does any of that involve talking to the server? I don't necessarily see >>>> a problem with doing the above. If "something" is in cache then that >>>> should still work. >>>> >>>> The main idea here is that we want to avoid communicating with the >>>> backing store during the umount(2) pathwalk. >>>> >>>>> Discuss. >>>>> >>>>> OTON, umount-by-mount-id is an interesting idea, but we'll need to decide >>>>> what would the right permissions be for it. >>>>> >>>>> But please, lose the "mount table is a mapping from path prefix to filesystem" >>>>> notion - it really, really is not. IIRC, there are systems that work that way, >>>>> but it's nowhere near the semantics used by any Unices, all variants of Linux >>>>> included. >>>> >>>> I'm not opposed to something by umount-by-mount-id either. All of this >>>> seems like something that should probably rely on CAP_SYS_ADMIN. >>> >>> The permission model needs to account for the fact that mount ids are >>> global and as such you could in principle unmount any mount in any mount >>> namespace. IOW, you can circumvent lookup restrictions completely. >>> >>> So we could resolve the mnt-id to an FMODE_PATH and then very roughly >>> with no claim to solving everything: >>> >>> may_umount_by_mnt_id(struct path *opath) >>> { >>> struct path root; >>> bool reachable; >>> >>> // caller in principle able to circumvent lookup restrictions >>> if (!may_cap_dac_readsearch()) >>> return false; >>> >>> // caller can mount in their mountns >>> if (!may_mount()) >>> return false; >>> >>> // target mount and caller in the same mountns >>> if (!check_mnt()) >>> return false; >>> >>> // caller could in principle reach mount from it's root >>> get_fs_root(current->fs, &root); >>> reachable = is_path_reachable(real_mount(opath->mnt), opath->dentry, &root); >>> path_put(&root); >>> >>> return reachable; >>> } >>> >>> However, that still means that we have laxer restrictions on unmounting >>> by mount-id then on unmount with lookup as for lookup just having >>> CAP_DAC_READ_SEARCH isn't enough. Usually - at least for filesytems >>> without custom permission handlers - we also establish that the inode >>> can be mapped into the caller's idmapping. >>> >>> So that would mean that unmounting by mount-id would allow you to >>> unmount mounts in cases where you wouldn't with umount. That might be >>> fine though as that's ultimately the goal here in a way. >>> >>> One could also see a very useful feature in this where you require >>> capable(CAP_DAC_READ_SEARCH) and capable(CAP_SYS_ADMIN) and then allow >>> unmounting any mount in the system by mount-id. This would obviously be >>> very useful for privileged service managers but I haven't thought this >>> Through. >> >> That is exactly why having a separate syscall to do the lookup of the mount-id is good: it provides separation of privilege. >> >> The conversion of mount-id to an O_PATH file descriptor is just akin to a path lookup, so only needs CAP_DAC_READ_SEARCH (since you require privilege only to bypass the ACL directory read and lookup restrictions). The resulting O_PATH file descriptor has no special properties that require any further privilege. >> >> Then use that resulting file descriptor for the umount, which normally requires CAP_SYS_ADMIN. > > There's a difference between unmounting directly by providing a mount id > and getting an O_PATH file descriptor from a mnt-id. If you can simply > unmount by mount-id it's useful for users that have CAP_DAC_READ_SEARCH > in a container. Without it you likely need to require > capable(CAP_DAC_READ_SEARCH) aka system level privileges just like > open_to_handle_at() which makes this interface way less generic and > usable. Otherwise you'd be able to get an O_PATH fd to something that > you wouldn't be able to access through normal path lookup. Being able to convert into an O_PATH descriptor gives you more options than just unmounting. It should allow you to syncfs() before unmounting. It should allow you to call open_tree() so you can manipulate the filesystem that is no longer accessible by path walk (e.g. so you can bind it elsewhere or move it). _________________________________ Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx