From: Mike Rapoport <rppt@xxxxxxxxxxxxx> ... that explains the rationale for the system call Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxx> Signed-off-by: Alejandro Colomar <alx.manpages@xxxxxxxxx> --- man2/memfd_secret.2 | 61 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/man2/memfd_secret.2 b/man2/memfd_secret.2 index f3380818e..869480b48 100644 --- a/man2/memfd_secret.2 +++ b/man2/memfd_secret.2 @@ -147,6 +147,67 @@ system call first appeared in Linux 5.14. The .BR memfd_secret () system call is Linux-specific. +.SH NOTES +.PP +The +.BR memfd_secret () +system call is designed to allow a user-space process +to create a range of memory that is inaccessible to anybody else - +kernel included. +There is no 100% guarantee that kernel won't be able to access +memory ranges backed by +.BR memfd_secret () +in any circumstances, but nevertheless, +it is much harder to exfiltrate data from these regions. +.PP +The +.BR memfd_secret () +provides the following protections: +.IP \(bu 3 +Enhanced protection +(in conjunction with all the other in-kernel attack prevention systems) +against ROP attacks. +Absence of any in-kernel primitive for accessing memory backed by +.BR memfd_secret () +means that one-gadget ROP attack +can't work to perform data exfiltration. +The attacker would need to find enough ROP gadgets +to reconstruct the missing page table entries, +which significantly increases difficulty of the attack, +especially when other protections like the kernel stack size limit +and address space layout randomization are in place. +.IP \(bu +Prevent cross-process userspace memory exposures. +Once a region for a +.BR memfd_secret () +memory mapping is allocated, +the user can't accidentally pass it into the kernel +to be transmitted somewhere. +The memory pages in this region cannot be accessed via the direct map +and they are disallowed in get_user_pages. +.IP \(bu +Harden against exploited kernel flaws. +In order to access memory areas backed by +.BR memfd_secret(), +a kernel-side attack would need to +either walk the page tables and create new ones, +or spawn a new privileged userspace process to perform +secrets exfiltration using +.BR ptrace (2). +.PP +The way +.BR memfd_secret () +allocates and locks the memory may impact overall system performance, +therefore the system call is disabled by default and only available +if the system administrator turned it on using +"secretmem.enable=y" kernel parameter. +.PP +To prevent potiential data leaks of memory regions backed by +.BR memfd_secret() +from a hybernation image, +hybernation is prevented when there are active +.BR memfd_secret () +users. .SH SEE ALSO .BR fcntl (2), .BR ftruncate (2), -- 2.33.0