On 07/08/2016 03:35 AM, Mel Gorman wrote: > On Thu, Jul 07, 2016 at 04:09:22PM -0700, Dave Hansen wrote: >> b/arch/x86/include/asm/pkeys.h | 39 ++++++++++++++++++++++++++++++++++----- >> b/mm/mprotect.c | 4 ---- >> 2 files changed, 34 insertions(+), 9 deletions(-) >> >> diff -puN mm/mprotect.c~pkeys-119-fast-set-get mm/mprotect.c >> --- a/mm/mprotect.c~pkeys-119-fast-set-get 2016-07-07 12:25:49.582075153 -0700 >> +++ b/mm/mprotect.c 2016-07-07 12:42:50.516384977 -0700 >> @@ -542,10 +542,8 @@ SYSCALL_DEFINE2(pkey_get, int, pkey, uns >> if (flags) >> return -EINVAL; >> >> - down_write(¤t->mm->mmap_sem); >> if (!mm_pkey_is_allocated(current->mm, pkey)) >> ret = -EBADF; >> - up_write(¤t->mm->mmap_sem); >> >> if (ret) >> return ret; > > This does allow the possibility of > > thread a thread b > pkey_get enter > pkey_free > pkey_alloc > pkey_get leave > > The kernel can tell if the key is allocated but not if it is the same > allocation userspace expected or not. That's why I thought this may need > to be a sequence counter. Unfortunately, now I realise that even that is > insufficient because the seqcounter would only detect that something > changed, it would have no idea if the pkey of interest was affected or > not. It gets rapidly messy after that. > > Userspace may have no choice other than to serialise itself but the > documentation needs to be clear that the above race is possible. Yeah, I'll clarify the documentation. But, I do think this is one of those races like an stat(). A stat() tells you that a file was once there with so and so properties, but it does not mean that it is there any more or that what _is_ there is the same thing you stat()'d. >> diff -puN arch/x86/include/asm/pkeys.h~pkeys-119-fast-set-get arch/x86/include/asm/pkeys.h >> --- a/arch/x86/include/asm/pkeys.h~pkeys-119-fast-set-get 2016-07-07 12:26:19.265421712 -0700 >> +++ b/arch/x86/include/asm/pkeys.h 2016-07-07 15:18:15.391642423 -0700 >> @@ -35,18 +35,47 @@ extern int __arch_set_user_pkey_access(s >> >> #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | VM_PKEY_BIT3) >> >> +#define PKEY_MAP_SET 1 >> +#define PKEY_MAP_CLEAR 2 >> #define mm_pkey_allocation_map(mm) (mm->context.pkey_allocation_map) >> -#define mm_set_pkey_allocated(mm, pkey) do { \ >> - mm_pkey_allocation_map(mm) |= (1U << pkey); \ >> +static inline >> +void mm_modify_pkey_alloc_map(struct mm_struct *mm, int pkey, int setclear) >> +{ >> + u16 new_map = mm_pkey_allocation_map(mm); >> + if (setclear == PKEY_MAP_SET) >> + new_map |= (1U << pkey); >> + else if (setclear == PKEY_MAP_CLEAR) >> + new_map &= ~(1U << pkey); >> + else >> + BUILD_BUG_ON(1); >> + /* >> + * Make sure that mm_pkey_is_allocated() callers never >> + * see intermediate states by using WRITE_ONCE(). >> + * Concurrent calls to this function are excluded by >> + * down_write(mm->mmap_sem) so we only need to protect >> + * against readers. >> + */ >> + WRITE_ONCE(mm_pkey_allocation_map(mm), new_map); >> +} > > What prevents two pkey_set operations overwriting each others change with > WRITE_ONCE? Does this not need to be a cmpxchg read-modify-write loops? pkey_set() only reads the allocation map and only writes to PKRU which is thread-local. The writers to the allocation map are pkey_alloc()/free() and those are still mmap_sem-protected. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>