On 03/09/2016 10:40 PM, Dave Hansen wrote: > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > Memory Protection Keys for User pages is an Intel CPU feature > which will first appear on Skylake Servers, but will also be > supported on future non-server parts (there is also a QEMU > implementation). It provides a mechanism for enforcing > page-based protections, but without requiring modification of the > page tables when an application wishes to change permissions. > > I have propsed adding five new system calls to support this feature. > The five calls are distributed across three man-pages (one existing > and 2 new), plus a new pkey(7) page which serves as a general > overview of the feature. > > The system calls for this feature are not currently upstream but > can be found here: > > http://git.kernel.org/cgit/linux/kernel/git/daveh/x86-pkeys.git/ > > Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: mtk.manpages@xxxxxxxxx > Cc: linux-man@xxxxxxxxxxxxxxx > Cc: linux-api@xxxxxxxxxxxxxxx > Cc: x86@xxxxxxxxxx > --- > man2/mprotect.2 | 35 ++++++++++++++++++++-- > man2/pkey_alloc.2 | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++ > man2/pkey_get.2 | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > man2/sigaction.2 | 6 ++++ > man7/pkey.7 | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 292 insertions(+), 3 deletions(-) > create mode 100644 man2/pkey_alloc.2 > create mode 100644 man2/pkey_get.2 > create mode 100644 man7/pkey.7 > > diff --git a/man2/mprotect.2 b/man2/mprotect.2 > index ae305f6..80ce909 100644 > --- a/man2/mprotect.2 > +++ b/man2/mprotect.2 > @@ -29,6 +29,7 @@ > .\" Modified 2004-08-16 by Andi Kleen <ak@xxxxxx> > .\" 2007-06-02, mtk: Fairly substantial rewrites and additions, and > .\" a much improved example program. > +.\" 2016-03-03, added pkey_mprotect, Dave Hansen <dave@xxxxxxxx> > .\" > .\" FIXME The following protection flags need documenting: > .\" PROT_SEM > @@ -38,16 +39,19 @@ > .\" > .TH MPROTECT 2 2015-07-23 "Linux" "Linux Programmer's Manual" > .SH NAME > -mprotect \- set protection on a region of memory > +mprotect, pkey_mprotect \- set protection on a region of memory > .SH SYNOPSIS > .nf > .B #include <sys/mman.h> > .sp > .BI "int mprotect(void *" addr ", size_t " len ", int " prot ); > +.BI "int pkey_mprotect(void *" addr ", size_t " len ", int " prot ", int " pkey "); > .fi > .SH DESCRIPTION > .BR mprotect () > -changes protection for the calling process's memory page(s) > +and > +.BR pkey_mprotect () > +change protection for the calling process's memory page(s) > containing any part of the address range in the > interval [\fIaddr\fP,\ \fIaddr\fP+\fIlen\fP\-1]. > .I addr > @@ -74,10 +78,18 @@ The memory can be modified. > .TP > .B PROT_EXEC > The memory can be executed. > +.PP > +.I pkey > +is the protection key to assign to the memory. > +A pkey must be allocated with > +.BR pkey_alloc (2) > +before it is passed to pkey_mprotect (). ==> new line: .BR pkey_mprotect (). > .SH RETURN VALUE > On success, > .BR mprotect () > -returns zero. > +and > +.BR pkey_mprotect () > +return zero. > On error, \-1 is returned, and > .I errno > is set appropriately. > @@ -95,6 +107,8 @@ to mark it > .B EINVAL > \fIaddr\fP is not a valid pointer, > or not a multiple of the system page size. > +Or: \fIpkey\fP has not been allocated with > +.BR pkey_alloc (2) > .\" Or: both PROT_GROWSUP and PROT_GROWSDOWN were specified in 'prot'. > .TP > .B ENOMEM > @@ -165,6 +179,20 @@ but at a minimum can allow write access only if > has been set, and must not allow any access if > .B PROT_NONE > has been set. > + > +Applications should be careful when mixing use of > +.BR mprotect () > +and > +.BR pkey_mprotect () . > +On x86, when > +.BR mprotect () > +is used with > +.IR prot > +set to > +.B PROT_EXEC > +a pkey is may be allocated and set on the memory implicitly > +by the kernel, but only when the pkey was 0 previously. > + > .SH EXAMPLE > .\" sigaction.2 refers to this example > .PP > @@ -246,3 +274,4 @@ main(int argc, char *argv[]) > .SH SEE ALSO > .BR mmap (2), > .BR sysconf (3) > +.BR pkey (7) In a commit message, you note: "On systems that do not support protection keys, it still works, but requires that key=0." I think this could be added in NOTES. > diff --git a/man2/pkey_alloc.2 b/man2/pkey_alloc.2 > new file mode 100644 > index 0000000..13fec90 > --- /dev/null > +++ b/man2/pkey_alloc.2 > @@ -0,0 +1,82 @@ > +.\" Copyright (C) 2016 Intel Corporation > +.\" > +.\" %%%license_start(verbatim) > +.\" permission is granted to make and distribute verbatim copies of this > +.\" manual provided the copyright notice and this permission notice are > +.\" preserved on all copies. > +.\" > +.\" permission is granted to copy and distribute modified versions of this > +.\" manual under the conditions for verbatim copying, provided that the > +.\" entire resulting derived work is distributed under the terms of a > +.\" permission notice identical to this one. > +.\" > +.\" since the linux kernel and libraries are constantly changing, this > +.\" manual page may be incorrect or out-of-date. the author(s) assume no > +.\" responsibility for errors or omissions, or for damages resulting from > +.\" the use of the information contained herein. the author(s) may not > +.\" have taken the same level of care in the production of this manual, > +.\" which is licensed free of charge, as they might when working > +.\" professionally. > +.\" > +.\" formatted or processed versions of this manual, if unaccompanied by > +.\" the source, must acknowledge the copyright and author of this work. > +.\" %%%license_end > +.\" > +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> > +.\" > +.\" > +.TH PKEY_ALLOC 2 2016-03-03 "Linux" "Linux Programmer's Manual" > +.SH NAME > +pkey_alloc, pkey_free \- allocate or free a protection key > +.SH SYNOPSIS > +.nf > +.B #include <sys/mman.h> > +.sp > +.BI "int pkey_alloc(unsigned long " flags ", unsigned long " access_rights ");" > +.BI "int pkey_free(int " pkey ");" > +.fi > +.SH DESCRIPTION > +.BR pkey_alloc () > +and > +.BR pkey_free () > +allow or disallow the calling process to use the given > +protection key for all protection-key-related operations. Actually, the above paragraph doesn't explain what pkey_free() does. That explanation should, I think, be in a separate paragraph below the description of 'flags'. > +.PP > +.I flags > +may contain zero or more disable operations: > +.TP > +.B PKEY_DISABLE_ACCESS > +Disable all data access to memory covered by the returned protection key. > +.TP > +.B PKEY_DISABLE_WRITE > +Disable write access to memory covered by the returned protection key. > +.SH RETURN VALUE > +On success, > +.BR pkey_alloc () > +returns a positive protection key value. > +.BR pkey_free () > +returns zero. > +On error, \-1 is returned, and > +.I errno > +is set appropriately. > +.SH ERRORS > +.TP > +.B EINVAL > +.IR pkey , > +.IR flags , > +or > +.I access_rights > +is invalid. > +.TP > +.B ENOSPC At the start of the following paragraph, add .(RB pkey_alloc ()) so that the reader knows that this error applies only for that syscall. > +All protection keys available for the current process have > +been allocated. The number of keys available is architecture > +an implementation-specfic and may be reduced by kernel-internal > +use of certain keys. There are currently 15 keys available to > +user programs on x86. Here, there should be a VERSIONS section noting the Linux kernel version where these system calls appeared and a CONFORMING TO section noting that these system calls are Linux-specific. > +.SH SEE ALSO > +.BR pkey_mprotect (2), Move above line after the next line. > +.BR pkey_get (2), > +.BR pkey_set (2), > +.BR pkey (7) > diff --git a/man2/pkey_get.2 b/man2/pkey_get.2 > new file mode 100644 > index 0000000..89a6015 > --- /dev/null > +++ b/man2/pkey_get.2 > @@ -0,0 +1,88 @@ > +.\" Copyright (C) 2016 Intel Corporation > +.\" > +.\" %%%LICENSE_START(VERBATIM) > +.\" Permission is granted to make and distribute verbatim copies of this > +.\" manual provided the copyright notice and this permission notice are > +.\" preserved on all copies. > +.\" > +.\" Permission is granted to copy and distribute modified versions of this > +.\" manual under the conditions for verbatim copying, provided that the > +.\" entire resulting derived work is distributed under the terms of a > +.\" permission notice identical to this one. > +.\" > +.\" Since the Linux kernel and libraries are constantly changing, this > +.\" manual page may be incorrect or out-of-date. The author(s) assume no > +.\" responsibility for errors or omissions, or for damages resulting from > +.\" the use of the information contained herein. The author(s) may not > +.\" have taken the same level of care in the production of this manual, > +.\" which is licensed free of charge, as they might when working > +.\" professionally. > +.\" > +.\" Formatted or processed versions of this manual, if unaccompanied by > +.\" the source, must acknowledge the copyright and author of this work. > +.\" %%%LICENSE_END > +.\" > +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> > +.\" > +.\" > +.TH PKEY_GET 2 2016-03-03 "Linux" "Linux Programmer's Manual" > +.SH NAME > +pkey_get, pkey_set \- manage protection key access permissions > +.SH SYNOPSIS > +.nf > +.B #include <sys/mman.h> > +.sp > +.BI "int pkey_get(int " pkey); > +.BI "int pkey_set(int " pkey ", unsigned long " access_rights ");" > +.fi > +.SH DESCRIPTION > +.BR pkey_get () > +and > +.BR pkey_set () > +query or set the current set of rights for the calling > +thread for the given protection key. > +When rights for a key are disabled, any future access > +to any memory region with that key set will generate > +a SIGSEGV. Access rights are private to each thread. Rewrite the preceding paragraph as === .BR pkey_set () sets the current set of rights for the calling thread for the protection key specified by .IR pkey . When rights for a key are disabled, any future access to any memory region with that key set will generate a .B SIGSEGV signal. Access rights are private to each thread. .PP .I access_rights may contain zero or more disable operations: .TP .B PKEY_DISABLE_ACCESS Disable all access to memory protected by the specified protection key. .TP .B PKEY_DISABLE_WRITE Disable write access to memory protected by the specified protection key. The .pkey_get () system call returns the current set of rights assigned for the protection key, .IR pkey . === The next three paragraphs should I think be moved to a NOTES section lower in the page. > +.PP > +When any signal handler is invoked, the thread is temporarily > +given a new, default set of protection key rights that override > +whatever rights were set in the interrupted context. The > +thread's protection key rights are restored when the signal > +handler returns. > > +Any call to Make the preceding line: "The effects of a call to" > +.BR pkey_set () > +from a signal handler will not persist when the signal handler > +returns. > + > +This signal behavior is unusual and is due to the fact that > +the x86 PKRU register (which stores \fIaccess_rights\fP) > +is managed with the same hardware mechanism (XSAVE) that > +manages floating point registers. The signal behavior is > +the same as that of a floating point register. In a previous review of the pages, I asked: [[ And I have a question (and the answer probably should be documented in the manual page). What happens when one signal handler interrupts the execution of another? Do pkey_set() calls in the first handler persist into the second handler? I presume not, but it would be good to be a little more explicit about this. ]] I think this point does need to be covered in the man page. > +.PP > +.I access_rights > +may contain zero or more disable operations: > +.B PKEY_DISABLE_ACCESS > +and/or > +.B PKEY_DISABLE_WRITE The above paragraph should be moved up. See my rewrite above. > +.SH RETURN VALUE > +On success, > +.BR pkey_set () > +returns zero. > +.BR pkey_get () > +returns a mask containing one or more of the disable operations s/one/zero/ ? > +listed above. > +On error, \-1 is returned, and > +.I errno > +is set appropriately. > +.SH ERRORS > +.TP > +.B EINVAL > +An invalid protection key or access_rights was specified. Make that last line: .I pkey or .I access_rights is invalid. Here, there should be a VERSIONS section noting the Linux kernel version where these system calls appeared and a CONFORMING TO section noting that these system calls are Linux-specific. > +.SH SEE ALSO Order the section 2 pages alphabetically: > +.BR pkey_mprotect (2), > +.BR pkey_alloc (2), > +.BR pkey_free (2), > +.BR pkey (7), > diff --git a/man2/sigaction.2 b/man2/sigaction.2 > index 3704e74..18c1f44 100644 > --- a/man2/sigaction.2 > +++ b/man2/sigaction.2 > @@ -620,6 +620,12 @@ Address not mapped to object. > .TP > .B SEGV_ACCERR > Invalid permissions for mapped object. > +.TP > +.B SEGV_PKUERR > +Access was denied by memory protection keys. See: > +.BR pkeys (7). > +The protection key which applied to this access is available via > +.I si_pkey So, pi_key needs to be added to the structure definition shown earlier in the page. > .RE > .PP > The following values can be placed in > diff --git a/man7/pkey.7 b/man7/pkey.7 > new file mode 100644 > index 0000000..d3da531 > --- /dev/null > +++ b/man7/pkey.7 > @@ -0,0 +1,84 @@ > +.\" Copyright (C) 2016 Intel Corporation > +.\" > +.\" %%%LICENSE_START(VERBATIM) > +.\" Permission is granted to make and distribute verbatim copies of this > +.\" manual provided the copyright notice and this permission notice are > +.\" preserved on all copies. > +.\" > +.\" Permission is granted to copy and distribute modified versions of this > +.\" manual under the conditions for verbatim copying, provided that the > +.\" entire resulting derived work is distributed under the terms of a > +.\" permission notice identical to this one. > +.\" > +.\" Since the Linux kernel and libraries are constantly changing, this > +.\" manual page may be incorrect or out-of-date. The author(s) assume no > +.\" responsibility for errors or omissions, or for damages resulting from > +.\" the use of the information contained herein. The author(s) may not > +.\" have taken the same level of care in the production of this manual, > +.\" which is licensed free of charge, as they might when working > +.\" professionally. > +.\" > +.\" Formatted or processed versions of this manual, if unaccompanied by > +.\" the source, must acknowledge the copyright and authors of this work. > +.\" %%%LICENSE_END > +.\" > +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> > +.\" > +.TH PKEYS 7 2016-03-03 "Linux" "Linux Programmer's Manual" > +.SH NAME > +pkeys \- overview of Memory Protection Keys > +.SH DESCRIPTION > + > +Memory Protection Keys (pkeys) are an extension to existing > +page-based memory permissions. Normal page permissions using > +page tables require expensive system calls and TLB invalidations > +when changing permissions. Memory Protection Keys provide a > +mechanism for changing protections without requiring modification > +of the page tables on every permission change. > + > +To use pkeys, software must first "tag" a page in the pagetables > +with a pkey. After this tag is in place, an application only has > +to change the contents of a register in order to remove write > +access, or all access to a tagged page. > + > +pkeys work in conjunction with the existing PROT_READ / PROT_WRITE / > +PROT_EXEC permissions passed to system calls like > +.BR mprotect (2) > +and > +.BR mmap (2) s/$/,/ > +, but always act to further restrict these traditional permission s/, // > +mechanisms. > + > +To use this feature, the processor must support it, and Linux > +must contain support for the feature on a given processor. As of > +early 2016 only future Intel x86 processors are supported, and this > +hardware supports 16 protection keys in each process. However, > +pkey 0 is used as the default key, so a maximum of 15 are available > +for actual application use. Is there a recommended way for an application to discover whether the system supports pkeys? If so, that should be documented here. > + > +.SS Protection Keys system calls > +The Linux kernel implements the following pkey-related system calls: > +.BR pkey_mprotect (2), > +.BR pkey_alloc (2), > +.BR pkey_free (2), > +.BR pkey_set (2), > +and > +.BR pkey_get (2) . > +.SS /proc/[number]/smaps (since Linux 4.6) > +Each line contains information about a memory range used by the process, > +displaying\(emamong other information\(emthe the pkeys for each range on > +a line labeled: "ProtectionKey:". The above piece should be done as a patch to the 'smaps' entry in proc(5). > + > +.SH NOTES > +The Linux pkey system calls and > +.I /proc/[number]/smaps > +interface are available only The detail about smaps should also be in the patch to proc(5). > +if the kernel was configured and built with the > +.BR CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > +option. > +.SH SEE ALSO Order the following list alphabetically: > +.BR pkey_mprotect (2), > +.BR pkey_alloc (2), > +.BR pkey_free (2), > +.BR pkey_set (2), > +.BR pkey_get (2), Would it be possible to get a small, complete working example program in one of these pages? The axample could show how pkeys override traditional memory protections. I appreciate that the rest of us do not yet have suitable hardware, but presumably you do. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html