Re: [PATCH RFC v2] fhandle: expose u64 mount id to name_to_handle_at(2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 23, 2024 at 11:57 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
>
> Now that we provide a unique 64-bit mount ID interface in statx, we can
> now provide a race-free way for name_to_handle_at(2) to provide a file
> handle and corresponding mount without needing to worry about racing
> with /proc/mountinfo parsing.
>
> While this is not necessary if you are using AT_EMPTY_PATH and don't
> care about an extra statx(2) call, users that pass full paths into
> name_to_handle_at(2) need to know which mount the file handle comes from
> (to make sure they don't try to open_by_handle_at a file handle from a
> different filesystem) and switching to AT_EMPTY_PATH would require
> allocating a file for every name_to_handle_at(2) call, turning
>
>   err = name_to_handle_at(-EBADF, "/foo/bar/baz", &handle, &mntid,
>                           AT_HANDLE_MNT_ID_UNIQUE);
>
> into
>
>   int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
>   err1 = name_to_handle_at(fd, "", &handle, &unused_mntid, AT_EMPTY_PATH);
>   err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &statxbuf);
>   mntid = statxbuf.stx_mnt_id;
>   close(fd);
>
> Unlike AT_HANDLE_FID, as per Amir's suggestion, AT_HANDLE_MNT_ID_UNIQUE
> uses a new AT_* bit from the historically-unused 0xFF range (which we
> now define as being the "per-syscall" range for AT_* bits).
>

Sorry for nit picking, but I think that "Unlike AT_HANDLE_FID,..." is confusing
in this statement.
AT_HANDLE_FID is using a bit that was already effectively allocated for a
"per-syscall" range.
I don't think that mentioning AT_HANDLE_FID adds any clarity to the statement
so better drop it?

> Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> ---
> Changes in v2:
> - Fixed a few minor compiler warnings and a buggy copy_to_user() check.
> - Rename AT_HANDLE_UNIQUE_MOUNT_ID -> AT_HANDLE_MNT_ID_UNIQUE to match statx.
> - Switched to using an AT_* bit from 0xFF and defining that range as
>   being "per-syscall" for future usage.
> - Sync tools/ copy of <linux/fcntl.h> to include changes.
> - v1: <https://lore.kernel.org/r/20240520-exportfs-u64-mount-id-v1-1-f55fd9215b8e@xxxxxxxxxx>
> ---
>  fs/fhandle.c                     | 29 ++++++++++++++++++++++-------
>  include/linux/syscalls.h         |  2 +-
>  include/uapi/linux/fcntl.h       | 28 +++++++++++++++++++++-------
>  tools/include/uapi/linux/fcntl.h | 28 +++++++++++++++++++++-------
>  4 files changed, 65 insertions(+), 22 deletions(-)
>
[...]

> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..9ed9d65842c1 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -87,22 +87,24 @@
>  #define DN_ATTRIB      0x00000020      /* File changed attibutes */
>  #define DN_MULTISHOT   0x80000000      /* Don't remove notifier */
>
> +#define AT_FDCWD               -100    /* Special value used to indicate
> +                                           openat should use the current
> +                                           working directory. */

(more nit picking)
If you are changing this line, please at least add a new line,
this is a different namespace :-/
and perhaps change it to "Special value of dirfd argument..."

Also, better leave a comment here to discourage allocation from this range:

+ /* Reserved for per-syscall flags              0xff   */

> +#define AT_SYMLINK_NOFOLLOW    0x100   /* Do not follow symbolic links.  */
> +
>  /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value.  AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> + * The constants AT_REMOVEDIR and AT_EACCESS have the same value.  AT_EACCESS
> + * is meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
>   * unlinkat.  The two functions do completely different things and therefore,
>   * the flags can be allowed to overlap.  For example, passing AT_REMOVEDIR to
>   * faccessat would be undefined behavior and thus treating it equivalent to
>   * AT_EACCESS is valid undefined behavior.
>   */

If you are going to add this churn in this patch, please do it otherwise.
It does not make sense to have this long explanation about pre-syscall
AT_* flags in a different location from the comment you added about
"All new purely-syscall-specific AT_* flags.."
if this explanation is needed at all, it should be after the new comment
as an example.


> -#define AT_FDCWD               -100    /* Special value used to indicate
> -                                           openat should use the current
> -                                           working directory. */
> -#define AT_SYMLINK_NOFOLLOW    0x100   /* Do not follow symbolic links.  */
>  #define AT_EACCESS             0x200   /* Test access permitted for
>                                             effective IDs, not real IDs.  */
>  #define AT_REMOVEDIR           0x200   /* Remove directory instead of
>                                             unlinking file.  */

I really prefer to move those to the per-syscall section
right next to AT_HANDLE_FID and leave a comment here:

/* Reserved for per-syscall flags           0x200   */

> +
>  #define AT_SYMLINK_FOLLOW      0x400   /* Follow symbolic links.  */
>  #define AT_NO_AUTOMOUNT                0x800   /* Suppress terminal automount traversal */
>  #define AT_EMPTY_PATH          0x1000  /* Allow empty relative pathname */
> @@ -114,10 +116,22 @@
>
>  #define AT_RECURSIVE           0x8000  /* Apply to the entire subtree */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> +/*
> + * All new purely-syscall-specific AT_* flags should consider using bits from
> + * 0xFF, but the bits used by RENAME_* (0x7) should be avoided in case users
> + * decide to pass AT_* flags to renameat2() by accident.

Sorry, but I find the use of my renameat2() example a bit confusing
in this sentence.
If you mention it at all, please use "For example, the bits used by..."
but I think it is more important to say "...should consider re-using bits
already used by other per-syscalls flags".

> These flag bits are
> + * free for re-use by other syscall's syscall-specific flags without worry.
> + */
> +
> +/*
> + * Flags for name_to_handle_at(2). To save AT_ flag space we re-use the
> + * AT_EACCESS/AT_REMOVEDIR bit for AT_HANDLE_FID.
> + */

AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID have exact same status,
so instead of this asymmetric comment:

+/* Flags for faccessat(2) */
+#define AT_EACCESS             0x200   /* Test access permitted for
+                                           effective IDs, not real IDs.  */
+/* Flags for unlinkat(2) */
+#define AT_REMOVEDIR           0x200   /* Remove directory instead of
+                                           unlinking file.  */
+/* Flags for name_to_handle_at(2) */
+#define AT_HANDLE_FID          0x200   /* file handle is needed to
                                        compare object identity and may not
                                        be usable to open_by_handle_at(2) */

> +#define AT_HANDLE_MNT_ID_UNIQUE        0x80    /* return the u64 unique mount id */

IDGI, I think we may have been miscommunicating :-/
If 0x7 range is to be avoided for generic AT_ flags, then it *should* be used
for new per-syscall flags such as this one.

The reservation of 0xff is not a strong guarantee.
As long as people re-use new per-syscalls flags efficiently, we could
decide to reclaim some of this space for generic AT_ flags in the future
if it is needed.

I know most of the mess was here before your patch, but I think
it got to a point where we must put a little order before introducing
the new per-syscall flag.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux