The patch titled Subject: mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2 has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2.patch This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Jeff Xu <jeffxu@xxxxxxxxxxxx> Subject: mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2 Date: Tue, 11 Jun 2024 03:49:01 +0000 updates per Randy Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@xxxxxxxxxxxx Signed-off-by: Jeff Xu <jeffxu@xxxxxxxxxxxx> Cc: Aleksa Sarai <cyphar@xxxxxxxxxx> Cc: Barnabás PÅ?cze <pobrn@xxxxxxxxxxxxxx> Cc: Daniel Verkamp <dverkamp@xxxxxxxxxxxx> Cc: David Rheinsberg <david@xxxxxxxxxxxx> Cc: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx> Cc: Hugh Dickins <hughd@xxxxxxxxxx> Cc: Jorge Lucangeli Obes <jorgelo@xxxxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/userspace-api/mfd_noexec.rst | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) --- a/Documentation/userspace-api/mfd_noexec.rst~mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2 +++ a/Documentation/userspace-api/mfd_noexec.rst @@ -10,27 +10,27 @@ Introduction of non executable mfd :Contributor: Aleksa Sarai <cyphar@xxxxxxxxxx> -Since Linux introduced the memfd feature, memfd have always had their +Since Linux introduced the memfd feature, memfds have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently. -However, in a secure by default system, such as ChromeOS, (where all -executables should come from the rootfs, which is protected by Verified +However, in a secure-by-default system, such as ChromeOS, (where all +executables should come from the rootfs, which is protected by verified boot), this executable nature of memfd opens a door for NoExec bypass and enables â??confused deputy attackâ??. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code -and root escalation. [2] lists more VRP in this kind. +and root escalation. [2] lists more VRP of this kind. -On the other hand, executable memfd has its legit use, runc uses memfdâ??s +On the other hand, executable memfd has its legit use: runc uses memfdâ??s seal and executable feature to copy the contents of the binary then -execute them, for such system, we need a solution to differentiate runc's -use of executable memfds and an attacker's [3]. +execute them. For such a system, we need a solution to differentiate runc's +use of executable memfds and an attacker's [3]. -To address those above. +To address those above: - Let memfd_create() set X bit at creation time. - Let memfd be sealed for modifying X bit when NX is set. - - A new pid namespace sysctl: vm.memfd_noexec to help applications to + - Add a new pid namespace sysctl: vm.memfd_noexec to help applications to migrating and enforcing non-executable MFD. User API @@ -48,7 +48,7 @@ User API Note: ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that - app doesn't want sealing, it can add F_SEAL_SEAL after creation. + an app doesn't want sealing, it can add F_SEAL_SEAL after creation. Sysctl: @@ -68,14 +68,14 @@ The new pid namespaced sysctl vm.memfd_n - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED memfd_create() without MFD_NOEXEC_SEAL will be rejected. -The sysctl allows finer control of memfd_create for old-software that -doesn't set the executable bit, for example, a container with -vm.memfd_noexec=1 means the old-software will create non-executable memfd -by default while new-software can create executable memfd by setting +The sysctl allows finer control of memfd_create for old software that +doesn't set the executable bit; for example, a container with +vm.memfd_noexec=1 means the old software will create non-executable memfd +by default while new software can create executable memfd by setting MFD_EXEC. The value of vm.memfd_noexec is passed to child namespace at creation -time, in addition, the setting is hierarchical, i.e. during memfd_create, +time. In addition, the setting is hierarchical, i.e. during memfd_create, we will search from current ns to root ns and use the most restrictive setting. _ Patches currently in -mm which might be from jeffxu@xxxxxxxxxxxx are mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec.patch mm-memfd-add-documentation-for-mfd_noexec_seal-mfd_exec-v2.patch