On Wed, 12 Dec 2018, Dave Hansen wrote: > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > Memory protection key behavior should be the same in a child as it was > in the parent before a fork. But, there is a bug that resets the > state in the child at fork instead of preserving it. > > Our creation of new mm's is a bit convoluted. At fork(), the code > does: > > 1. memcpy() the parent mm to initialize child > 2. mm_init() to initalize some select stuff stuff > 3. dup_mmap() to create true copies that memcpy() > did not do right. > > For pkeys, we need to preserve two bits of state across a fork: > 'execute_only_pkey' and 'pkey_allocation_map'. Those are preserved by > the memcpy(), which I thought did the right thing. But, mm_init() > calls init_new_context(), which I thought was *only* for execve()-time > and overwrites 'execute_only_pkey' and 'pkey_allocation_map' with > "new" values. But, alas, init_new_context() is used at execve() and > fork(). > > The result is that, after a fork(), the child's pkey state ends up > looking like it does after an execve(), which is totally wrong. pkeys > that are already allocated can be allocated again, for instance. > > To fix this, add code called by dup_mmap() to copy the pkey state from > parent to child explicitly. Also add a comment above init_new_context() > to make it more clear to the next poor sod what this code is used for. > > Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls") > Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Cc: x86@xxxxxxxxxx > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > Cc: Will Deacon <will.deacon@xxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > Cc: Joerg Roedel <jroedel@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>