Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)

Christian Brauner <brauner@xxxxxxxxxx> · Fri, 24 May 2024 14:25:30 +0200

On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote:
> On 2024-05-21, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > provide a file handle and corresponding mount without needing to worry
> > > about racing with /proc/mountinfo parsing.
> > > 
> > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > that doesn't make sense for name_to_handle_at(2).
> > > 
> > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > > ---
> > 
> > So I think overall this is probably fine (famous last words). If it's
> > just about being able to retrieve the new mount id without having to
> > take the hit of another statx system call it's indeed a bit much to
> > add a revised system call for this. Althoug I did say earlier that I
> > wouldn't rule that out.
> > 
> > But if we'd that then it'll be a long discussion on the form of the new
> > system call and the information it exposes.
> > 
> > For example, I lack the grey hair needed to understand why
> > name_to_handle_at() returns a mount id at all. The pitch in commit
> > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > the (old) mount id can be used to "lookup file system specific
> > information [...] in /proc/<pid>/mountinfo".
> 
> The logic was presumably to allow you to know what mount the resolved
> file handle came from. If you use AT_EMPTY_PATH this is not needed
> because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if
> you just give name_to_handle_at() almost any path, there is no race-free
> way to make sure that you know which filesystem the file handle came
> from.
> 
> I don't know if that could lead to security issues (I guess an attacker
> could find a way to try to manipulate the file handle you get back, and
> then try to trick you into operating on the wrong filesystem with
> open_by_handle_at()) but it is definitely something you'd want to avoid.

So the following paragraphs are prefaced with: I'm not an expert on file
handle encoding and so might be totally wrong.

Afaiu, the uniqueness guarantee of the file handle mostly depends on:

(1) the filesystem
(2) the actual file handling encoding

Looking at file handle encoding to me it looks like it's fairly easy to
fake them in userspace (I guess that's ok if you think about them like a
path but with a weird permission model built around them.) for quite a
few filesystems.

For example, for anything that uses fs/libfs.c:generic_encode_ino32_fh()
it's easy to construct a file handle by retrieving the inode number via
stat and the generation number via FS_IOC_GETVERSION.

Encoding using the inode number and the inode generation number is
probably not very strong so it's not impossible to generate a file
handle that is not unique without knowing in which filesystem to
interpret it in.

The problem is with what name_to_handle_at() returns imho. A mnt_id
doesn't pin the filesystem and the old mnt_id isn't unique. That means
the filesystem can be unmounted and go away and the mnt_id can be
recycled almost immediately for another mount but the file handle is
still there.

So to guarantee that a (weakly encoded) file handle is interpreted in
the right filesystem the file handle must either be accompanied by a
file descriptor that pins the relevant mount or have any other guarantee
that the filesystem doesn't go away (Or of course, the file handle
encodes the uuid of the filesystem or something or uses some sort of
hashing scheme.).

One of the features of file handles is that they're globally usable so
they're interesting to use as handles that can be shared. IOW, one can
send around a file handle to another process without having to pin
anything or have a file descriptor open that needs to be sent via
AF_UNIX.

But as stated above that's potentially risky so one might still have to
send around an fd together with the file handle if sender and receiver
don't share the filesystem for the handle.

However, with the unique mount id things improve quite a bit. Because
now it's possible to send around the unique mount id and the file
handle. Then one can use statmount() to figure out which filesystem this
file handle needs to be interpreted in.

> 
> > Granted, that's doable but it'll mean a lot of careful checking to avoid
> > races for mount id recycling because they're not even allocated
> > cyclically. With lots of containers it becomes even more of an issue. So
> > it's doubtful whether exposing the mount id through name_to_handle_at()
> > would be something that we'd still do.
> > 
> > So really, if this is just about a use-case where you want to spare the
> > additional system call for statx() and you need the mnt_id then
> > overloading is probably ok.
> > 
> > But it remains an unpleasant thing to look at.
> > 
> 
> Yeah, I agree it's ugly.
> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>