Hi Mike,
On 9/2/21 9:50 AM, Mike Rapoport wrote:
From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
... that explains the rationale for the system call
Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxx>
I found a few formatting/wording issues (see below; but I fixed them
myself, so you don't need to worry about them).
In general, I understood the rationale for the system call,
so I applied the patch to my tree. However, there are some parts that I
didn't understand well, mostly related to kernel internals, but since
Michael knows more about those, I expect him to review those again when
I send him the patch.
Thanks!
Alex
---
man2/memfd_secret.2 | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/man2/memfd_secret.2 b/man2/memfd_secret.2
index f3380818e..869480b48 100644
--- a/man2/memfd_secret.2
+++ b/man2/memfd_secret.2
@@ -147,6 +147,67 @@ system call first appeared in Linux 5.14.
The
.BR memfd_secret ()
system call is Linux-specific.
+.SH NOTES
+.PP
Unnecessary .PP after .SH or .SS
+The
+.BR memfd_secret ()
+system call is designed to allow a user-space process
+to create a range of memory that is inaccessible to anybody else -
+kernel included.
+There is no 100% guarantee that kernel won't be able to access
+memory ranges backed by
+.BR memfd_secret ()
+in any circumstances, but nevertheless,
+it is much harder to exfiltrate data from these regions.
+.PP
+The
/The/d
+.BR memfd_secret ()
+provides the following protections:
+.IP \(bu 3
+Enhanced protection
+(in conjunction with all the other in-kernel attack prevention systems)
+against ROP attacks.
+Absence of any in-kernel primitive for accessing memory backed by
+.BR memfd_secret ()
+means that one-gadget ROP attack
+can't work to perform data exfiltration.
+The attacker would need to find enough ROP gadgets
+to reconstruct the missing page table entries,
+which significantly increases difficulty of the attack,
+especially when other protections like the kernel stack size limit
+and address space layout randomization are in place.
+.IP \(bu
+Prevent cross-process userspace memory exposures.
s/userspace/user-space/
+Once a region for a
+.BR memfd_secret ()
+memory mapping is allocated,
+the user can't accidentally pass it into the kernel
+to be transmitted somewhere.
+The memory pages in this region cannot be accessed via the direct map
+and they are disallowed in get_user_pages.
+.IP \(bu
+Harden against exploited kernel flaws.
+In order to access memory areas backed by
+.BR memfd_secret(),
+a kernel-side attack would need to
+either walk the page tables and create new ones,
+or spawn a new privileged userspace process to perform
s/userspace/user-space/
+secrets exfiltration using
+.BR ptrace (2).
+.PP
+The way
+.BR memfd_secret ()
+allocates and locks the memory may impact overall system performance,
+therefore the system call is disabled by default and only available
+if the system administrator turned it on using
+"secretmem.enable=y" kernel parameter.
+.PP
+To prevent potiential data leaks of memory regions backed by
+.BR memfd_secret()
+from a hybernation image,
+hybernation is prevented when there are active
+.BR memfd_secret ()
+users.
.SH SEE ALSO
.BR fcntl (2),
.BR ftruncate (2),
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/