Re: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David

Thanks email and patch for discussion.

On Fri, Jul 14, 2023 at 4:48 AM David Rheinsberg <david@xxxxxxxxxxxx> wrote:
>
> Add a new flag for memfd_create() called MFD_NOEXEC, which has the
> opposite effect as MFD_EXEC (i.e., it strips the executable bits from
> the inode file mode).
>
I previously thought about having the symmetric flags, such as
MFD_NOEXEC/MFD_EXEC/MFD_NOEXEC_SEAL/MFD_EXEC_SEAL, but decided against
it. The app shall decide beforehand what the memfd is created for, if
it is no-executable, then it should be sealed, such that it can't be
chmod to enable "X" bit.

> The default mode for memfd_create() has historically been to use 0777 as
> file modes. The new `MFD_EXEC` flag has now made this explicit, paving
> the way to reduce the default to 0666 and thus dropping the executable
> bits for security reasons. Additionally, the `MFD_NOEXEC_SEAL` flag has
> been added which allows this without changing the default right now.
>
> Unfortunately, `MFD_NOEXEC_SEAL` enforces `MFD_ALLOW_SEALING` and
> `F_SEAL_EXEC`, with no alternatives available. This leads to multiple
> issues:
>
>  * Applications that do not want to allow sealing either have to use
>    `MFD_EXEC` (which we don't want, unless they actually need the
>    executable bits), or they must add `F_SEAL_SEAL` immediately on
>    return of memfd_create(2) with `MFD_NOEXEC_SEAL`, since this
>    implicitly enables sealing.
>
>    Note that we explicitly added `MFD_ALLOW_SEALING` when creating
>    memfd_create(2), because enabling seals on existing users of shmem
>    without them knowing about it can easily introduce DoS scenarios.

The application that doesn't want MFD_NOEXEC_SEAL can use MFD_EXEC,
the kernel won't add MFD_ALLOW_SEALING implicitly. MFD_EXEC makes the
kernel behave the same as before, this is also  why sysctl
vm.memfd_noexec=0 can work seamlessly.

>   It
>    is unclear why `MFD_NOEXEC_SEAL` was designed to enable seals, and
>    this is especially dangerous with `MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL`
>    set via sysctl, since it will silently enable sealing on every memfd
>    created.
>
Without sealing, chmod(2) can modify the mfd to be executable, that is
the consideration that MFD_NOEXEC is not provided as an option.
Indeed, current design is "biased" towards promoting MFD_NOEXEC_SEAL
as the safest approach, and try to avoid the pitfall that dev
accidently uses "MFD_NOEXEC" without realizing it can still be
chmod().


>  * Applications that do not want `MFD_EXEC`, but rely on
>    `F_GET_SEALS` to *NOT* return `F_SEAL_EXEC` have no way of achieving
>    this other than using `MFD_EXEC` and clearing the executable bits
>    manually via fchmod(2). Using `MFD_NOEXEC_SEAL` will set
>    `F_SEAL_EXEC` and thus break existing code that hard-codes the
>    seal-set.
>
>    This is already an issue when sending log-memfds to systemd-journald
>    today, which verifies the seal-set of the received FD and fails if
>    unknown seals are set. Hence, you have to use `MFD_EXEC` when
>    creating memfds for this purpose, even though you really do not need
>    the executable bits set.
>
>  * Applications that want to enable the executable bit later on,
>    currently have no way to create the memfd without it. They have to
>    clear the bits immediately after creating it via fchmod(2), or just
>    leave them set.
>
Is it OK to do what you want in two steps ? What is the concern there ? i.e.
memfd_create(MFD_EXEC), then chmod to remove the "X" bit.

I imagine this is probably already what the application does for
creating no-executable mfd before my patch, i.e.:
memfd_create(), then chmod() to remove "X" to remove "X" bit.

Thanks!
-Jeff



-Jeff

> By adding MFD_NOEXEC, user-space is now able to clear the executable
> bits on all memfds immediately, clearly stating their intention, without
> requiring fixups after creating the memfd. In particular, it is highly
> useful for existing use-cases that cannot allow new seals to appear on
> memfds.
>
> Signed-off-by: David Rheinsberg <david@xxxxxxxxxxxx>
> ---
>  include/linux/pid_namespace.h |  4 ++--
>  include/uapi/linux/memfd.h    |  1 +
>  mm/memfd.c                    | 29 ++++++++++++++---------------
>  3 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index c758809d5bcf..02f8acf94512 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -19,9 +19,9 @@ struct fs_pin;
>  #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
>  /*
>   * sysctl for vm.memfd_noexec
> - * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
> + * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC
>   *     acts like MFD_EXEC was set.
> - * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
> + * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC
>   *     acts like MFD_NOEXEC_SEAL was set.
>   * 2: memfd_create() without MFD_NOEXEC_SEAL will be
>   *     rejected.
> diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
> index 273a4e15dfcf..b9e13bd65817 100644
> --- a/include/uapi/linux/memfd.h
> +++ b/include/uapi/linux/memfd.h
> @@ -12,6 +12,7 @@
>  #define MFD_NOEXEC_SEAL                0x0008U
>  /* executable */
>  #define MFD_EXEC               0x0010U
> +#define MFD_NOEXEC             0x0020U
>
>  /*
>   * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
> diff --git a/mm/memfd.c b/mm/memfd.c
> index e763e76f1106..2afe49368fc5 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -266,7 +266,9 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
>  #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
>  #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
>
> -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | MFD_NOEXEC_SEAL | MFD_EXEC)
> +#define MFD_ALL_FLAGS \
> +       (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
> +        MFD_NOEXEC_SEAL | MFD_EXEC | MFD_NOEXEC)
>
>  SYSCALL_DEFINE2(memfd_create,
>                 const char __user *, uname,
> @@ -289,11 +291,13 @@ SYSCALL_DEFINE2(memfd_create,
>                         return -EINVAL;
>         }
>
> -       /* Invalid if both EXEC and NOEXEC_SEAL are set.*/
> -       if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC_SEAL))
> +       if (flags & MFD_NOEXEC_SEAL)
> +               flags |= MFD_ALLOW_SEALING | MFD_NOEXEC;
> +
> +       if ((flags & MFD_EXEC) && (flags & MFD_NOEXEC))
>                 return -EINVAL;
>
> -       if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
> +       if (!(flags & (MFD_EXEC | MFD_NOEXEC))) {
>  #ifdef CONFIG_SYSCTL
>                 int sysctl = MEMFD_NOEXEC_SCOPE_EXEC;
>                 struct pid_namespace *ns;
> @@ -366,20 +370,15 @@ SYSCALL_DEFINE2(memfd_create,
>         file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE;
>         file->f_flags |= O_LARGEFILE;
>
> -       if (flags & MFD_NOEXEC_SEAL) {
> -               struct inode *inode = file_inode(file);
> +       if (flags & MFD_NOEXEC)
> +               file_inode(file)->i_mode &= ~0111;
>
> -               inode->i_mode &= ~0111;
> -               file_seals = memfd_file_seals_ptr(file);
> -               if (file_seals) {
> +       file_seals = memfd_file_seals_ptr(file);
> +       if (file_seals) {
> +               if (flags & MFD_ALLOW_SEALING)
>                         *file_seals &= ~F_SEAL_SEAL;
> +               if (flags & MFD_NOEXEC_SEAL)
>                         *file_seals |= F_SEAL_EXEC;
> -               }
> -       } else if (flags & MFD_ALLOW_SEALING) {
> -               /* MFD_EXEC and MFD_ALLOW_SEALING are set */
> -               file_seals = memfd_file_seals_ptr(file);
> -               if (file_seals)
> -                       *file_seals &= ~F_SEAL_SEAL;
>         }
>
>         fd_install(fd, file);
> --
> 2.41.0
>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux