On Thu, May 23, 2024 at 11:57 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > > Now that we provide a unique 64-bit mount ID interface in statx, we can > now provide a race-free way for name_to_handle_at(2) to provide a file > handle and corresponding mount without needing to worry about racing > with /proc/mountinfo parsing. > > While this is not necessary if you are using AT_EMPTY_PATH and don't > care about an extra statx(2) call, users that pass full paths into > name_to_handle_at(2) need to know which mount the file handle comes from > (to make sure they don't try to open_by_handle_at a file handle from a > different filesystem) and switching to AT_EMPTY_PATH would require > allocating a file for every name_to_handle_at(2) call, turning > > err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid, > AT_HANDLE_MNT_ID_UNIQUE); > > into > > int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC); > err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH); > err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf); > mntid = statxbuf.stx_mnt_id; > close(fd); > > Unlike AT_HANDLE_FID, as per Amir's suggestion, AT_HANDLE_MNT_ID_UNIQUE > uses a new AT_* bit from the historically-unused 0xFF range (which we > now define as being the "per-syscall" range for AT_* bits). > Sorry for nit picking, but I think that "Unlike AT_HANDLE_FID,..." is confusing in this statement. AT_HANDLE_FID is using a bit that was already effectively allocated for a "per-syscall" range. I don't think that mentioning AT_HANDLE_FID adds any clarity to the statement so better drop it? > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > --- > Changes in v2: > - Fixed a few minor compiler warnings and a buggy copy_to_user() check. > - Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx. > - Switched to using an AT_* bit from 0xFF and defining that range as > being "per-syscall" for future usage. > - Sync tools/ copy of <linux/fcntl.h> to include changes. > - v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@xxxxxxxxxx> > --- > fs/fhandle.c | 29 ++++++++++++++++++++++------- > include/linux/syscalls.h | 2 +- > include/uapi/linux/fcntl.h | 28 +++++++++++++++++++++------- > tools/include/uapi/linux/fcntl.h | 28 +++++++++++++++++++++------- > 4 files changed, 65 insertions(+), 22 deletions(-) > [...] > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > index c0bcc185fa48..9ed9d65842c1 100644 > --- a/include/uapi/linux/fcntl.h > +++ b/include/uapi/linux/fcntl.h > @@ -87,22 +87,24 @@ > #define DN_ATTRIB 0x00000020 /* File changed attibutes */ > #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */ > > +#define AT_FDCWD -100 /* Special value used to indicate > + openat should use the current > + working directory. */ (more nit picking) If you are changing this line, please at least add a new line, this is a different namespace :-/ and perhaps change it to "Special value of dirfd argument..." Also, better leave a comment here to discourage allocation from this range: + /* Reserved for per-syscall flags 0xff */ > +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ > + > /* > - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is > - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to > + * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS > + * is meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to > * unlinkat. The two functions do completely different things and therefore, > * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to > * faccessat would be undefined behavior and thus treating it equivalent to > * AT_EACCESS is valid undefined behavior. > */ If you are going to add this churn in this patch, please do it otherwise. It does not make sense to have this long explanation about pre-syscall AT_* flags in a different location from the comment you added about "All new purely-syscall-specific AT_* flags.." if this explanation is needed at all, it should be after the new comment as an example. > -#define AT_FDCWD -100 /* Special value used to indicate > - openat should use the current > - working directory. */ > -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ > #define AT_EACCESS 0x200 /* Test access permitted for > effective IDs, not real IDs. */ > #define AT_REMOVEDIR 0x200 /* Remove directory instead of > unlinking file. */ I really prefer to move those to the per-syscall section right next to AT_HANDLE_FID and leave a comment here: /* Reserved for per-syscall flags 0x200 */ > + > #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ > #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ > @@ -114,10 +116,22 @@ > > #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > > -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ > +/* > + * All new purely-syscall-specific AT_* flags should consider using bits from > + * 0xFF, but the bits used by RENAME_* (0x7) should be avoided in case users > + * decide to pass AT_* flags to renameat2() by accident. Sorry, but I find the use of my renameat2() example a bit confusing in this sentence. If you mention it at all, please use "For example, the bits used by..." but I think it is more important to say "...should consider re-using bits already used by other per-syscalls flags". > These flag bits are > + * free for re-use by other syscall's syscall-specific flags without worry. > + */ > + > +/* > + * Flags for name_to_handle_at(2). To save AT_ flag space we re-use the > + * AT_EACCESS/AT_REMOVEDIR bit for AT_HANDLE_FID. > + */ AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID have exact same status, so instead of this asymmetric comment: +/* Flags for faccessat(2) */ +#define AT_EACCESS 0x200 /* Test access permitted for + effective IDs, not real IDs. */ +/* Flags for unlinkat(2) */ +#define AT_REMOVEDIR 0x200 /* Remove directory instead of + unlinking file. */ +/* Flags for name_to_handle_at(2) */ +#define AT_HANDLE_FID 0x200 /* file handle is needed to compare object identity and may not be usable to open_by_handle_at(2) */ > +#define AT_HANDLE_MNT_ID_UNIQUE 0x80 /* return the u64 unique mount id */ IDGI, I think we may have been miscommunicating :-/ If 0x7 range is to be avoided for generic AT_ flags, then it *should* be used for new per-syscall flags such as this one. The reservation of 0xff is not a strong guarantee. As long as people re-use new per-syscalls flags efficiently, we could decide to reclaim some of this space for generic AT_ flags in the future if it is needed. I know most of the mess was here before your patch, but I think it got to a point where we must put a little order before introducing the new per-syscall flag. Thanks, Amir.