This set takes all of the feedback on the last version into account and simplifies the ABI. It adds one feature: restrictive 'init_pkru' support. I realize it's during the merge window, but I'm posting so folks who aren't busy with merge window activities can take a look. Barring any new issues, I think this is ready to be applied once 4.9 material is being queued. Folks wishing to run this code can do so on any processor with the new PKU support in qemu >=2.6. Just boot with -cpu qemu64,+pku,+xsave, and make sure to apply this patch[1] to qemu. Changes from v5: * Removed pkey_set/get() system calls to simplify ABI * Added 'init_pkru' support to ensure we have a restrictive PKRU by default. * Requisite changes to selftests, plus some bugfixes around stdio in signal handlers -- Memory Protection Keys for User pages (pkeys) is a CPU feature which will first appear on Skylake Servers, but will also be supported on future non-server parts. It provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables when an application changes wishes to change permissions. Among other things, this feature was designed to help fix a class of bugs in long-running applications where data corruption is detected long after it occurs. Applications today either live with the corruption or eat a huge performance penalty from calling mprotect() frequently. The developers of these applications are already running this code and are very eager to see this feature merged and picked up in future distributions where their customers can use it. Patches to implement execute-only mapping support using pkeys were merged in to 4.6. But, to do anything more useful with pkeys, an application needs to be able to set the pkey field in the PTE (obviously has to be done in-kernel) and make changes to the "rights" register (using unprivileged instructions). An application also needs to have an an allocator for the keys themselves. If two different parts of an application both want to protect their data with pkeys, they first need to know which key to use for their individual purposes. This set introduces 3 system calls: sys_pkey_mprotect(): apply PTE to memory (patches #1-3) sys_pkey_alloc(): ask the kernel for a free pkey (patch #4) sys_pkey_free(): the reverse of alloc (patch #4) I have manpages written for these syscalls, and have had multiple rounds of reviews on the manpages list. I have not revised them to remove pkey_get/set(), but will once this is merged in -tip. This set is also available here: git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-pkeys.git pkeys-v040 I've written a set of unit tests for these interfaces, which is available as the last patch in the series and integrated in to kselftests. Folks wishing to run this code can do so with the new PKU support in qemu >=2.6. Just boot with -cpu qemu64,+pku,+xsave, and make sure to apply this patch[1] to qemu. === diffstat === Dave Hansen (10): x86, pkeys: add fault handling for PF_PK page fault bit mm: implement new pkey_mprotect() system call x86, pkeys: make mprotect_key() mask off additional vm_flags x86, pkeys: allocation/free syscalls x86: wire up protection keys system calls generic syscalls: wire up memory protection keys syscalls pkeys: add details of system call use to Documentation/ x86, pkeys: default to a restrictive init PKRU x86, pkeys: allow configuration of init_pkru x86, pkeys: add self-tests Documentation/kernel-parameters.txt | 5 + Documentation/x86/protection-keys.txt | 63 + arch/alpha/include/uapi/asm/mman.h | 5 + arch/mips/include/uapi/asm/mman.h | 5 + arch/parisc/include/uapi/asm/mman.h | 5 + arch/x86/entry/syscalls/syscall_32.tbl | 5 + arch/x86/entry/syscalls/syscall_64.tbl | 5 + arch/x86/include/asm/mmu.h | 8 + arch/x86/include/asm/mmu_context.h | 25 +- arch/x86/include/asm/pkeys.h | 73 +- arch/x86/kernel/fpu/core.c | 4 + arch/x86/kernel/fpu/xstate.c | 5 +- arch/x86/mm/fault.c | 9 + arch/x86/mm/pkeys.c | 143 +- arch/xtensa/include/uapi/asm/mman.h | 5 + include/linux/pkeys.h | 41 +- include/linux/syscalls.h | 8 + include/uapi/asm-generic/mman-common.h | 5 + include/uapi/asm-generic/unistd.h | 12 +- mm/mprotect.c | 90 +- tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/pkey-helpers.h | 219 +++ tools/testing/selftests/x86/protection_keys.c | 1411 +++++++++++++++++ 23 files changed, 2116 insertions(+), 38 deletions(-) === changelog === Changes from v5: * remove sys_pkey_get/set() to simplify the ABI. There was concern they could not be easily vsyscall-accelerated. * Added 'init_pkru' support to ensure we have a restrictive PKRU by default. * Requisite changes to selftests, plus some bugfixes around stdio in signal handlers Changes from v4: * removed validate_pkey(). It was redundant with the work we do in mm_pkey_alloc() and all of the mm_pkey_is_allocated() checks. * reorder patches to wait to wire up any syscalls until the end. * make allocation map functions explicity use unsigned masks * some tweaks to changelog (and associated manpages) Changes from v3: * added generic syscalls declarations to include/linux/syscalls.h to fix arm64 compile issue. Changes from v2: * selftest updates: * formatting changes like what Ingo asked for with MPX * actually call WRPKRU in __wrpkru() * once __wrpkru() was fixed, revealed a bug in the ptrace test where we were testing against the wrong pointer during the "baseline" test * Man-pages that match this set are here: http://marc.info/?l=linux-man&m=146540723525616&w=2 Changes from v1: * updates to alloc/free patch description calling out that "in-use" pkeys may still be pkey_free()'d successfully. * Fixed a bug in the selftest where the 'flags' argument was not passed to pkey_get(). * Added all syscalls to generic syscalls header * Added extra checking to selftests so it doesn't fall over when 1G pages are made the hugetlbfs default. 1. http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg04774.html Cc: linux-api@xxxxxxxxxxxxxxx Cc: linux-arch@xxxxxxxxxxxxxxx Cc: linux-mm@xxxxxxxxx Cc: x86@xxxxxxxxxx Cc: torvalds@xxxxxxxxxxxxxxxxxxxx Cc: akpm@xxxxxxxxxxxxxxxxxxxx Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: mgorman@xxxxxxxxxxxxxxxxxxx Cc: Dave Hansen (Intel) <dave.hansen@xxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html