On Mon, May 27, 2024 at 10:06:15AM +1000, Dave Chinner wrote: > On Fri, May 24, 2024 at 02:25:30PM +0200, Christian Brauner wrote: > > On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote: > > > On 2024-05-21, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote: > > > > > Now that we have stabilised the unique 64-bit mount ID interface in > > > > > statx, we can now provide a race-free way for name_to_handle_at(2) to > > > > > provide a file handle and corresponding mount without needing to worry > > > > > about racing with /proc/mountinfo parsing. > > > > > > > > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit > > > > > that doesn't make sense for name_to_handle_at(2). > > > > > > > > > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > > > > > --- > > > > > > > > So I think overall this is probably fine (famous last words). If it's > > > > just about being able to retrieve the new mount id without having to > > > > take the hit of another statx system call it's indeed a bit much to > > > > add a revised system call for this. Althoug I did say earlier that I > > > > wouldn't rule that out. > > > > > > > > But if we'd that then it'll be a long discussion on the form of the new > > > > system call and the information it exposes. > > > > > > > > For example, I lack the grey hair needed to understand why > > > > name_to_handle_at() returns a mount id at all. The pitch in commit > > > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that > > > > the (old) mount id can be used to "lookup file system specific > > > > information [...] in /proc/<pid>/mountinfo". > > > > > > The logic was presumably to allow you to know what mount the resolved > > > file handle came from. If you use AT_EMPTY_PATH this is not needed > > > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if > > > you just give name_to_handle_at() almost any path, there is no race-free > > > way to make sure that you know which filesystem the file handle came > > > from. > > > > > > I don't know if that could lead to security issues (I guess an attacker > > > could find a way to try to manipulate the file handle you get back, and > > > then try to trick you into operating on the wrong filesystem with > > > open_by_handle_at()) but it is definitely something you'd want to avoid. > > > > So the following paragraphs are prefaced with: I'm not an expert on file > > handle encoding and so might be totally wrong. > > > > Afaiu, the uniqueness guarantee of the file handle mostly depends on: > > > > (1) the filesystem > > (2) the actual file handling encoding > > > > Looking at file handle encoding to me it looks like it's fairly easy to > > fake them in userspace (I guess that's ok if you think about them like a > > path but with a weird permission model built around them.) for quite a > > few filesystems. > > This is a feature specifically used by XFS utilities like xfs_fsr > and xfsdump to take pathless inode information retrieved by bulkstat > operations to a file handle to enable arbitrary inodes in the > filesystem to be opened. > > i.e. they scan the filesystem by walking the filesystem's internal > inode index, not by walking paths to scan the filesytsem. This is > *much* faster than path walking, especially on seek-limited storage > hardware. > > Knowing the inode number, generation and fs uuid, we can > construct a valid filehandle for that inode, and then do whatever > operations we need to do (defrag, backup, move offline (HSM), etc) > without needing to know the path to that inode.... Yeah, I think you mentioned this in another context before. > > The problem is with what name_to_handle_at() returns imho. A mnt_id > > doesn't pin the filesystem and the old mnt_id isn't unique. That means > > the filesystem can be unmounted and go away and the mnt_id can be > > recycled almost immediately for another mount but the file handle is > > still there. > > This is no different to the NFS server unexporting a filesystem - > all attempts to resolve the file handle the client holds for that > export must now fail. Hence the filehandle itself must have a > superblock specific identifier in it. > > > So to guarantee that a (weakly encoded) file handle is interpreted in > > the right filesystem the file handle must either be accompanied by a > > file descriptor that pins the relevant mount or have any other guarantee > > that the filesystem doesn't go away (Or of course, the file handle > > encodes the uuid of the filesystem or something or uses some sort of > > hashing scheme.). > > Yes, it does, and that's the superblock based identifier, not > something that is mount based. i.e. the handle is valid regardless > of where the filesystem is mounted, even if the application does not > have visibility or permitted access to the given mount. And the > filehandle is persistent - it must work across unmount/mount, > reboots, change of mount location, etc. Agreed, and no one is disputing that. The old mount id as exposed by name_to_handle_at() is one means to get to a persisent identifier and that is racy. But we do have a 64bit non-recyled mount id and statmount() since v6.7 now which allow to get a persistent identifier for the filesystem in a race-free manner.