On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote: > On 2024-05-21, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote: > > > Now that we have stabilised the unique 64-bit mount ID interface in > > > statx, we can now provide a race-free way for name_to_handle_at(2) to > > > provide a file handle and corresponding mount without needing to worry > > > about racing with /proc/mountinfo parsing. > > > > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit > > > that doesn't make sense for name_to_handle_at(2). > > > > > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > > > --- > > > > So I think overall this is probably fine (famous last words). If it's > > just about being able to retrieve the new mount id without having to > > take the hit of another statx system call it's indeed a bit much to > > add a revised system call for this. Althoug I did say earlier that I > > wouldn't rule that out. > > > > But if we'd that then it'll be a long discussion on the form of the new > > system call and the information it exposes. > > > > For example, I lack the grey hair needed to understand why > > name_to_handle_at() returns a mount id at all. The pitch in commit > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that > > the (old) mount id can be used to "lookup file system specific > > information [...] in /proc/<pid>/mountinfo". > > The logic was presumably to allow you to know what mount the resolved > file handle came from. If you use AT_EMPTY_PATH this is not needed > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if > you just give name_to_handle_at() almost any path, there is no race-free > way to make sure that you know which filesystem the file handle came > from. > > I don't know if that could lead to security issues (I guess an attacker > could find a way to try to manipulate the file handle you get back, and > then try to trick you into operating on the wrong filesystem with > open_by_handle_at()) but it is definitely something you'd want to avoid. So the following paragraphs are prefaced with: I'm not an expert on file handle encoding and so might be totally wrong. Afaiu, the uniqueness guarantee of the file handle mostly depends on: (1) the filesystem (2) the actual file handling encoding Looking at file handle encoding to me it looks like it's fairly easy to fake them in userspace (I guess that's ok if you think about them like a path but with a weird permission model built around them.) for quite a few filesystems. For example, for anything that uses fs/libfs.c:generic_encode_ino32_fh() it's easy to construct a file handle by retrieving the inode number via stat and the generation number via FS_IOC_GETVERSION. Encoding using the inode number and the inode generation number is probably not very strong so it's not impossible to generate a file handle that is not unique without knowing in which filesystem to interpret it in. The problem is with what name_to_handle_at() returns imho. A mnt_id doesn't pin the filesystem and the old mnt_id isn't unique. That means the filesystem can be unmounted and go away and the mnt_id can be recycled almost immediately for another mount but the file handle is still there. So to guarantee that a (weakly encoded) file handle is interpreted in the right filesystem the file handle must either be accompanied by a file descriptor that pins the relevant mount or have any other guarantee that the filesystem doesn't go away (Or of course, the file handle encodes the uuid of the filesystem or something or uses some sort of hashing scheme.). One of the features of file handles is that they're globally usable so they're interesting to use as handles that can be shared. IOW, one can send around a file handle to another process without having to pin anything or have a file descriptor open that needs to be sent via AF_UNIX. But as stated above that's potentially risky so one might still have to send around an fd together with the file handle if sender and receiver don't share the filesystem for the handle. However, with the unique mount id things improve quite a bit. Because now it's possible to send around the unique mount id and the file handle. Then one can use statmount() to figure out which filesystem this file handle needs to be interpreted in. > > > Granted, that's doable but it'll mean a lot of careful checking to avoid > > races for mount id recycling because they're not even allocated > > cyclically. With lots of containers it becomes even more of an issue. So > > it's doubtful whether exposing the mount id through name_to_handle_at() > > would be something that we'd still do. > > > > So really, if this is just about a use-case where you want to spare the > > additional system call for statx() and you need the mnt_id then > > overloading is probably ok. > > > > But it remains an unpleasant thing to look at. > > > > Yeah, I agree it's ugly. > > -- > Aleksa Sarai > Senior Software Engineer (Containers) > SUSE Linux GmbH > <https://www.cyphar.com/>