From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Memory Protection Keys for User pages is an Intel CPU feature which will first appear on Skylake Servers, but will also be supported on future non-server parts (there is also a QEMU implementation). It provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables when an application wishes to change permissions. I have propsed adding five new system calls to support this feature. The five calls are distributed across three man-pages (one existing and 2 new), plus a new pkey(7) page which serves as a general overview of the feature. The system calls for this feature are not currently upstream but can be found here: http://git.kernel.org/cgit/linux/kernel/git/daveh/x86-pkeys.git/ Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: mtk.manpages@xxxxxxxxx Cc: linux-man@xxxxxxxxxxxxxxx Cc: linux-api@xxxxxxxxxxxxxxx Cc: x86@xxxxxxxxxx --- man2/mprotect.2 | 45 ++++++++++++- man2/pkey_alloc.2 | 103 +++++++++++++++++++++++++++++ man2/pkey_get.2 | 110 +++++++++++++++++++++++++++++++ man2/sigaction.2 | 9 +++ man5/proc.5 | 8 +++ man7/pkey.7 | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 464 insertions(+), 3 deletions(-) create mode 100644 man2/pkey_alloc.2 create mode 100644 man2/pkey_get.2 create mode 100644 man7/pkey.7 diff --git a/man2/mprotect.2 b/man2/mprotect.2 index ae305f6..742af70 100644 --- a/man2/mprotect.2 +++ b/man2/mprotect.2 @@ -29,6 +29,7 @@ .\" Modified 2004-08-16 by Andi Kleen <ak@xxxxxx> .\" 2007-06-02, mtk: Fairly substantial rewrites and additions, and .\" a much improved example program. +.\" 2016-03-03, added pkey_mprotect, Dave Hansen <dave@xxxxxxxx> .\" .\" FIXME The following protection flags need documenting: .\" PROT_SEM @@ -38,16 +39,19 @@ .\" .TH MPROTECT 2 2015-07-23 "Linux" "Linux Programmer's Manual" .SH NAME -mprotect \- set protection on a region of memory +mprotect, pkey_mprotect \- set protection on a region of memory .SH SYNOPSIS .nf .B #include <sys/mman.h> .sp .BI "int mprotect(void *" addr ", size_t " len ", int " prot ); +.BI "int pkey_mprotect(void *" addr ", size_t " len ", int " prot ", int " pkey "); .fi .SH DESCRIPTION .BR mprotect () -changes protection for the calling process's memory page(s) +and +.BR pkey_mprotect () +change protection for the calling process's memory page(s) containing any part of the address range in the interval [\fIaddr\fP,\ \fIaddr\fP+\fIlen\fP\-1]. .I addr @@ -74,10 +78,19 @@ The memory can be modified. .TP .B PROT_EXEC The memory can be executed. +.PP +.I pkey +is the protection key to assign to the memory. +A pkey must be allocated with +.BR pkey_alloc (2) +before it is passed to +.BR pkey_mprotect (). .SH RETURN VALUE On success, .BR mprotect () -returns zero. +and +.BR pkey_mprotect () +return zero. On error, \-1 is returned, and .I errno is set appropriately. @@ -95,6 +108,8 @@ to mark it .B EINVAL \fIaddr\fP is not a valid pointer, or not a multiple of the system page size. +Or: \fIpkey\fP has not been allocated with +.BR pkey_alloc (2) .\" Or: both PROT_GROWSUP and PROT_GROWSDOWN were specified in 'prot'. .TP .B ENOMEM @@ -165,6 +180,29 @@ but at a minimum can allow write access only if has been set, and must not allow any access if .B PROT_NONE has been set. + +Applications should be careful when mixing use of +.BR mprotect () +and +.BR pkey_mprotect () . +On x86, when +.BR mprotect () +is used with +.IR prot +set to +.B PROT_EXEC +a pkey is may be allocated and set on the memory implicitly +by the kernel, but only when the pkey was 0 previously. + +On systems that do not support protection keys in hardware, +.BR pkey_mprotect () +may still be used, but +.IR pkey +must be set to 0. When called this way, the operation of +.BR pkey_mprotect () +is equivalent to +.BR mprotect (). + .SH EXAMPLE .\" sigaction.2 refers to this example .PP @@ -246,3 +284,4 @@ main(int argc, char *argv[]) .SH SEE ALSO .BR mmap (2), .BR sysconf (3) +.BR pkey (7) diff --git a/man2/pkey_alloc.2 b/man2/pkey_alloc.2 new file mode 100644 index 0000000..e931f82 --- /dev/null +++ b/man2/pkey_alloc.2 @@ -0,0 +1,103 @@ +.\" Copyright (C) 2016 Intel Corporation +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and author of this work. +.\" %%%LICENSE_END +.\" +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> +.\" +.\" +.TH PKEY_ALLOC 2 2016-03-03 "Linux" "Linux Programmer's Manual" +.SH NAME +pkey_alloc, pkey_free \- allocate or free a protection key +.SH SYNOPSIS +.nf +.B #include <sys/mman.h> +.sp +.BI "int pkey_alloc(unsigned long " flags ", unsigned long " access_rights ");" +.BI "int pkey_free(int " pkey ");" +.fi +.SH DESCRIPTION +.BR pkey_alloc () +allocates a protection key and allows it to be passed to +the other interfaces that accept a protection key like +.BR pkey_mprotect (), +.BR pkey_set () +and +.BR pkey_get (). +.PP +.BR pkey_free () +frees a protection key and makes it available to later +allocations. After a protection key has been freed, it may +no longer be used in any protection-key-related operations. +.PP +.RB ( pkey_alloc ()) +.I flags +may contain zero or more disable operations: +.TP +.B PKEY_DISABLE_ACCESS +Disable all data access to memory covered by the returned protection key. +.TP +.B PKEY_DISABLE_WRITE +Disable write access to memory covered by the returned protection key. +.SH RETURN VALUE +On success, +.BR pkey_alloc () +returns a positive protection key value. +.BR pkey_free () +returns zero. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EINVAL +.IR pkey , +.IR flags , +or +.I access_rights +is invalid. +.TP +.B ENOSPC +.(RB pkey_alloc ()) +All protection keys available for the current process have +been allocated. The number of keys available is architecture +an implementation-specfic and may be reduced by kernel-internal +use of certain keys. There are currently 15 keys available to +user programs on x86. +.SH VERSIONS +.BR pkey_alloc () +and +.BR pkey_free () +were added to Linux in kernel <FIXME>; +library support was added to glibc in version <FIXME>. +.SH CONFORMING TO +The +.BR pkey_alloc () +and +.BR pkey_free () +system calls are Linux-specific. +.SH +.SH SEE ALSO +.BR pkey_get (2), +.BR pkey_mprotect (2), +.BR pkey_set (2), +.BR pkey (7) diff --git a/man2/pkey_get.2 b/man2/pkey_get.2 new file mode 100644 index 0000000..b965786 --- /dev/null +++ b/man2/pkey_get.2 @@ -0,0 +1,110 @@ +.\" Copyright (C) 2016 Intel Corporation +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and author of this work. +.\" %%%LICENSE_END +.\" +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> +.\" +.\" +.TH PKEY_GET 2 2016-03-03 "Linux" "Linux Programmer's Manual" +.SH NAME +pkey_get, pkey_set \- manage protection key access permissions +.SH SYNOPSIS +.nf +.B #include <sys/mman.h> +.sp +.BI "int pkey_get(int " pkey); +.BI "int pkey_set(int " pkey ", unsigned long " access_rights ");" +.fi +.SH DESCRIPTION +.BR pkey_set () +sets the current set of rights for the calling +thread for the protection key specified by +.IR pkey . +When rights for a key are disabled, any future access +to any memory region with that key set will generate a +.B SIGSEGV +signal. +Access rights are private to each thread. +.PP +.I access_rights +may contain zero or more disable operations: +.TP +.B PKEY_DISABLE_ACCESS +Disable all access to memory protected by the specified protection key. +.TP +.B PKEY_DISABLE_WRITE +Disable write access to memory protected by the specified protection key. +.SH RETURN VALUE +On success, +.BR pkey_set () +returns zero. +.BR pkey_get () +returns a mask containing zero or more of the disable operations +listed above. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EINVAL +.I pkey +or +.I access_rights +is invalid. +.SH NOTES +When any signal handler is invoked, the thread is temporarily +given a new, default set of protection key rights that override +whatever rights were set in the interrupted context. The +thread's protection key rights are restored when the signal +handler returns. + +The effects of a call to +.BR pkey_set () +from a signal handler will not persist when control passes out of +the signal handler. +This is true both when the handler returns to a normal, +nonsignal context, and when the signal handler is interrupted +by another signal handler. + +This signal behavior is unusual and is due to the fact that +the x86 PKRU register (which stores \fIaccess_rights\fP) +is managed with the same hardware mechanism (XSAVE) that +manages floating point registers. The signal behavior is +the same as that of a floating point register. +.SH VERSIONS +.BR pkey_get () +and +.BR pkey_set () +were added to Linux in kernel <FIXME>; +library support was added to glibc in version <FIXME>. +.SH CONFORMING TO +The +.BR pkey_get () +and +.BR pkey_set () +system calls are Linux-specific. +.SH SEE ALSO +.BR pkey_alloc (2), +.BR pkey_free (2), +.BR pkey_mprotect (2), +.BR pkey (7), diff --git a/man2/sigaction.2 b/man2/sigaction.2 index 3704e74..ed5b874 100644 --- a/man2/sigaction.2 +++ b/man2/sigaction.2 @@ -45,6 +45,7 @@ .\" 2010-06-11 mtk, improvements to discussion of various siginfo_t fields. .\" 2015-01-17, Kees Cook <keescook@xxxxxxxxxxxx> .\" Added notes on ptrace SIGTRAP and SYS_SECCOMP. +.\" 2016-03-10, Dave Hansen, add si_pkey .\" .TH SIGACTION 2 2015-08-08 "Linux" "Linux Programmer's Manual" .SH NAME @@ -305,6 +306,8 @@ siginfo_t { (since Linux 3.5) */ unsigned int si_arch; /* Architecture of attempted system call (since Linux 3.5) */ + unsigned int si_pkey; /* Protection key set on si_addr + (since Linux <FIXME>) */ } .fi .in @@ -620,6 +623,12 @@ Address not mapped to object. .TP .B SEGV_ACCERR Invalid permissions for mapped object. +.TP +.B SEGV_PKUERR +Access was denied by memory protection keys. See: +.BR pkeys (7). +The protection key which applied to this access is available via +.I si_pkey .RE .PP The following values can be placed in diff --git a/man5/proc.5 b/man5/proc.5 index 768d920..e3132a1 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -47,6 +47,7 @@ .\" and /proc/[pid]/fdinfo/*. .\" 2008-06-19, mtk, Documented /proc/[pid]/status. .\" 2008-07-15, mtk, added /proc/config.gz +.\" 2016-03-10, Dave Hansen, added ProtectionKey to /proc/[pid]/smaps .\" .\" FIXME . cross check against Documentation/filesystems/proc.txt .\" to see what information could be imported from that file @@ -1471,6 +1472,13 @@ The codes are the following: nh - no-huge page advise flag mg - mergeable advise flag +"ProtectionKey" field contains the memory protection key (see +.BR pkeys (5)) +associated with the virtual memory area. Only present if the +kernel was built with the +.B CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +configuration option. (since Linux 4.6) + The .IR /proc/[pid]/smaps file is present only if the diff --git a/man7/pkey.7 b/man7/pkey.7 new file mode 100644 index 0000000..815ad7f --- /dev/null +++ b/man7/pkey.7 @@ -0,0 +1,192 @@ +.\" Copyright (C) 2016 Intel Corporation +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.\" Created 2016-03-03 by Dave Hansen <dave@xxxxxxxx> +.\" +.TH PKEYS 7 2016-03-03 "Linux" "Linux Programmer's Manual" +.SH NAME +pkeys \- overview of Memory Protection Keys +.SH DESCRIPTION + +Memory Protection Keys (pkeys) are an extension to existing +page-based memory permissions. Normal page permissions using +page tables require expensive system calls and TLB invalidations +when changing permissions. Memory Protection Keys provide a +mechanism for changing protections without requiring modification +of the page tables on every permission change. + +To use pkeys, software must first "tag" a page in the pagetables +with a pkey. After this tag is in place, an application only has +to change the contents of a register in order to remove write +access, or all access to a tagged page. + +pkeys work in conjunction with the existing PROT_READ / PROT_WRITE / +PROT_EXEC permissions passed to system calls like +.BR mprotect (2) +and +.BR mmap (2), +but always act to further restrict these traditional permission +mechanisms. + +To use this feature, the processor must support it, and Linux +must contain support for the feature on a given processor. As of +early 2016 only future Intel x86 processors are supported, and this +hardware supports 16 protection keys in each process. However, +pkey 0 is used as the default key, so a maximum of 15 are available +for actual application use. + +Any application wanting to use protection keys needs to be able +to function without them. They might be unavailable because the +hardware that the application runs on does not support them, the +kernel code does not contain support, the kernel support has been +disabled, or because the keys have all been allocated, perhaps by +a library the application is using. It is recommended that +applications wanting to use protection keys should simply call +.BR pkey_alloc () +instead of attempting to detect support for the +feature in any other way. + +Hardware support for protection keys may be enumerated with +the cpuid instruction. Details on how to do this can be +found in the Intel Software Developers Manual. The kernel +performs this enumeration and exposes the information in +/proc/cpuinfo under the "flags" field. "pku" in this field +indicates hardware support for protection keys and "ospke" +indicates that the kernel contains and has enabled protection +keys support. +.SS Protection Keys system calls +The Linux kernel implements the following pkey-related system calls: +.BR pkey_mprotect (2), +.BR pkey_alloc (2), +.BR pkey_free (2), +.BR pkey_set (2), +and +.BR pkey_get (2) . +.SH NOTES +The Linux pkey system calls and are available only +if the kernel was configured and built with the +.BR CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +option. +.SH EXAMPLE +.PP +The program below allocates a page of memory with read/write +permissions via PROT_READ|PROT_WRITE. It then writes some +data to the memory and successfully reads it back. After +that, it attempts to allocate a protection key and +disallows access to it by passsing +.BR PKEY_DISABLE_ACCESS +to +.BR pkey_set. +It then tried to access +.BR buffer +which we now expect to cause a fatal signal to the application. +.in +4n +.nf +.RB "$" " ./a.out" +buffer contains: 73 +about to read buffer again... +Segmentation fault (core dumped) +.fi +.in +.SS Program source +\& +.nf +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> +#include <stdio.h> +#include <sys/mman.h> +#include <assert.h> + +int sys_pkey_get(int pkey, unsigned long flags) +{ + return syscall(SYS_pkey_get, pkey); +} + +int sys_pkey_set(int pkey, unsigned long rights, unsigned long flags) +{ + return syscall(SYS_pkey_set, pkey, rights, flags); +} + +int sys_pkey_mprotect(void *ptr, size_t size, unsigned long orig_prot, unsigned long pkey) +{ + return syscall(SYS_pkey_mprotect, ptr, size, orig_prot, pkey); +} + +int pkey_alloc(void) +{ + return syscall(SYS_pkey_alloc, 0, 0); +} + +int pkey_free(unsigned long pkey) +{ + return syscall(SYS_pkey_free, pkey); +} + +int main(void) +{ + int err; + int pkey; + int *buffer; + + /* Allocate one page of memory: */ + buffer = mmap(NULL, getpagesize(), PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + assert(buffer != (void *)-1); + + /* Put some randome data in to the page (still OK to touch): */ + (*buffer) = __LINE__; + printf("buffer contains: %d\\n", *buffer); + + /* Allocate a protection key: */ + pkey = pkey_alloc(); + assert(pkey > 0); + + /* Disable access to any memory with "pkey" set, + * even though there is none right now. */ + err = sys_pkey_set(pkey, PKEY_DISABLE_ACCESS, 0); + + /* + * set the protection key on "buffer": + * Note that it is still read/write as far as mprotect() is, + * concerned and the previous pkey_set() overrides it. + */ + err = sys_pkey_mprotect(buffer, getpagesize(), PROT_READ|PROT_WRITE, pkey); + assert(!err); + + printf("about to read buffer again...\\n"); + /* this will crash, because we have disallowed access: */ + printf("buffer contains: %d\\n", *buffer); + + err = pkey_free(pkey); + assert(!err); + + return 0; +} + +.SH SEE ALSO +.BR pkey_alloc (2), +.BR pkey_free (2), +.BR pkey_get (2), +.BR pkey_mprotect (2), +.BR pkey_set (2), -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html