On 2024-05-24, Christian Brauner <brauner@xxxxxxxxxx> wrote: > On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote: > > On 2024-05-21, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote: > > > > Now that we have stabilised the unique 64-bit mount ID interface in > > > > statx, we can now provide a race-free way for name_to_handle_at(2) to > > > > provide a file handle and corresponding mount without needing to worry > > > > about racing with /proc/mountinfo parsing. > > > > > > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit > > > > that doesn't make sense for name_to_handle_at(2). > > > > > > > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > > > > --- > > > > > > So I think overall this is probably fine (famous last words). If it's > > > just about being able to retrieve the new mount id without having to > > > take the hit of another statx system call it's indeed a bit much to > > > add a revised system call for this. Althoug I did say earlier that I > > > wouldn't rule that out. > > > > > > But if we'd that then it'll be a long discussion on the form of the new > > > system call and the information it exposes. > > > > > > For example, I lack the grey hair needed to understand why > > > name_to_handle_at() returns a mount id at all. The pitch in commit > > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that > > > the (old) mount id can be used to "lookup file system specific > > > information [...] in /proc/<pid>/mountinfo". > > > > The logic was presumably to allow you to know what mount the resolved > > file handle came from. If you use AT_EMPTY_PATH this is not needed > > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if > > you just give name_to_handle_at() almost any path, there is no race-free > > way to make sure that you know which filesystem the file handle came > > from. > > > > I don't know if that could lead to security issues (I guess an attacker > > could find a way to try to manipulate the file handle you get back, and > > then try to trick you into operating on the wrong filesystem with > > open_by_handle_at()) but it is definitely something you'd want to avoid. > > So the following paragraphs are prefaced with: I'm not an expert on file > handle encoding and so might be totally wrong. > > Afaiu, the uniqueness guarantee of the file handle mostly depends on: > > (1) the filesystem > (2) the actual file handling encoding > > Looking at file handle encoding to me it looks like it's fairly easy to > fake them in userspace (I guess that's ok if you think about them like a > path but with a weird permission model built around them.) for quite a > few filesystems. The old Docker breakout attack did brute-force the fhandle for the root directory of the host filesystem, so it is entirely possible. However, the attack I was thinking of was whether a directory tree that an attacker had mount permissions over could be manipulated such that a privileged process doing name_to_handle_at() on a path within the tree would get a file handle that open_by_handle_at() on a different filesystem would result in a potentially dangerous path being opened. For instance (M is management process, C is the malicious container process): C: Bind-mounts the root of the container filesystem at /foo. M: name_to_handle_at($CONTAINER/foo) -> gets an fhandle of / of the container filesystem -> stores a copy of the (recycled) mount id C: Swaps /foo with a bind-mount of the host root filesystem such that the mount id is recycled, before M can scan /proc/self/mountinfo. C: Triggers M to try to use the filehandle for some administrative process. M: open_by_handle_at(...) on the wrong mount id, getting an fd of the host / directory. Possibly does something bad to this directory (deleting files, passing the fd back to the container, etc). It seems possible that this could happen, so having a unique mount id is kind of important if you plan to use open_by_handle_at() with malicious actors in control of the target filesystem tree. Though, regardless of the attack you are worried about, I guess we are in agreement that a unique mount id from name_to_handle_at() would be a good idea if we are planning for userspace to use file handles for everything. > For example, for anything that uses fs/libfs.c:generic_encode_ino32_fh() > it's easy to construct a file handle by retrieving the inode number via > stat and the generation number via FS_IOC_GETVERSION. > > Encoding using the inode number and the inode generation number is > probably not very strong so it's not impossible to generate a file > handle that is not unique without knowing in which filesystem to > interpret it in. > > The problem is with what name_to_handle_at() returns imho. A mnt_id > doesn't pin the filesystem and the old mnt_id isn't unique. That means > the filesystem can be unmounted and go away and the mnt_id can be > recycled almost immediately for another mount but the file handle is > still there. > > So to guarantee that a (weakly encoded) file handle is interpreted in > the right filesystem the file handle must either be accompanied by a > file descriptor that pins the relevant mount or have any other guarantee > that the filesystem doesn't go away (Or of course, the file handle > encodes the uuid of the filesystem or something or uses some sort of > hashing scheme.). > > One of the features of file handles is that they're globally usable so > they're interesting to use as handles that can be shared. IOW, one can > send around a file handle to another process without having to pin > anything or have a file descriptor open that needs to be sent via > AF_UNIX. > > But as stated above that's potentially risky so one might still have to > send around an fd together with the file handle if sender and receiver > don't share the filesystem for the handle. > > However, with the unique mount id things improve quite a bit. Because > now it's possible to send around the unique mount id and the file > handle. Then one can use statmount() to figure out which filesystem this > file handle needs to be interpreted in. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
Attachment:
signature.asc
Description: PGP signature