On Wed, Aug 3, 2022 at 2:03 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > * Kyle Huey <me@xxxxxxxxxxxx> wrote: > > > From: Kyle Huey <me@xxxxxxxxxxxx> > > > > When management of the PKRU register was moved away from XSTATE, emulation > > of PKRU's existence in XSTATE was added for APIs that read XSTATE, but not > > for APIs that write XSTATE. This can be seen by running gdb and executing > > `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the > > write to the PKRU register (which gdb performs through ptrace) is ignored. > > > > There are three relevant APIs: PTRACE_SETREGSET with NT_X86_XSTATE, > > sigreturn, and KVM_SET_XSAVE. KVM_SET_XSAVE has its own special handling to > > make PKRU writes take effect (in fpu_copy_uabi_to_guest_fpstate). Push that > > down into copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE > > and sigreturn pass in pointers to the appropriate PKRU value. > > > > This also adds code to initialize the PKRU value to the hardware init value > > (namely 0) if the PKRU bit is not set in the XSTATE header to match XRSTOR. > > This is a change to the current KVM_SET_XSAVE behavior. > > > > Signed-off-by: Kyle Huey <me@xxxxxxxxxxxx> > > Cc: kvm@xxxxxxxxxxxxxxx # For edge case behavior of KVM_SET_XSAVE > > Cc: stable@xxxxxxxxxxxxxxx # 5.14+ > > Fixes: e84ba47e313dbc097bf859bb6e4f9219883d5f78 > > --- > > arch/x86/kernel/fpu/core.c | 11 +---------- > > arch/x86/kernel/fpu/regset.c | 2 +- > > arch/x86/kernel/fpu/signal.c | 2 +- > > arch/x86/kernel/fpu/xstate.c | 26 +++++++++++++++++++++----- > > arch/x86/kernel/fpu/xstate.h | 4 ++-- > > 5 files changed, 26 insertions(+), 19 deletions(-) > > > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > > index 0531d6a06df5..dfb79e2ee81f 100644 > > --- a/arch/x86/kernel/fpu/core.c > > +++ b/arch/x86/kernel/fpu/core.c > > @@ -406,16 +406,7 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > > if (ustate->xsave.header.xfeatures & ~xcr0) > > return -EINVAL; > > > > - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); > > - if (ret) > > - return ret; > > - > > - /* Retrieve PKRU if not in init state */ > > - if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) { > > - xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU); > > - *vpkru = xpkru->pkru; > > - } > > - return 0; > > + return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); > > } > > EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate); > > #endif /* CONFIG_KVM */ > > diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c > > index 75ffaef8c299..6d056b68f4ed 100644 > > --- a/arch/x86/kernel/fpu/regset.c > > +++ b/arch/x86/kernel/fpu/regset.c > > @@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, > > } > > > > fpu_force_restore(fpu); > > - ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf); > > + ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru); > > > > out: > > vfree(tmpbuf); > > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c > > index 91d4b6de58ab..558076dbde5b 100644 > > --- a/arch/x86/kernel/fpu/signal.c > > +++ b/arch/x86/kernel/fpu/signal.c > > @@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx, > > > > fpregs = &fpu->fpstate->regs; > > if (use_xsave() && !fx_only) { > > - if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx)) > > + if (copy_sigframe_from_user_to_xstate(tsk, buf_fx)) > > return false; > > } else { > > if (__copy_from_user(&fpregs->fxsave, buf_fx, > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > > index c8340156bfd2..1eea7af4afd9 100644 > > --- a/arch/x86/kernel/fpu/xstate.c > > +++ b/arch/x86/kernel/fpu/xstate.c > > @@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size, > > > > > > static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > > - const void __user *ubuf) > > + const void __user *ubuf, u32 *pkru) > > { > > struct xregs_state *xsave = &fpstate->regs.xsave; > > unsigned int offset, size; > > @@ -1235,6 +1235,22 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > > for (i = 0; i < XFEATURE_MAX; i++) { > > mask = BIT_ULL(i); > > > > + if (i == XFEATURE_PKRU) { > > + /* > > + * Retrieve PKRU if not in init state, otherwise > > + * initialize it. > > + */ > > + if (hdr.xfeatures & mask) { > > + struct pkru_state xpkru = {0}; > > + > > + copy_from_buffer(&xpkru, xstate_offsets[i], > > + sizeof(xpkru), kbuf, ubuf); > > Shouldn't the failure case of copy_from_buffer() be handled? Yes, it should be. The sigreturn case could hit it. > Also, what's the security model for this register, do we trust all input > values user-space provides for the PKRU field in the XSTATE? I realize that > WRPKRU already gives user-space write access to the register - but does the > CPU write it all into the XSTATE, with no restrictions on content > whatsoever? There is no security model for this register. The CPU does write whatever is given to WRPKRU (or XRSTOR) into the PKRU register. The pkeys(7) man page notes: Protection keys have the potential to add a layer of security and reliability to applications. But they have not been primarily designed as a security feature. For instance, WRPKRU is a completely unprivileged instruction, so pkeys are useless in any case that an attacker controls the PKRU register or can execute arbitrary instructions. And the ERIM paper (https://www.usenix.org/system/files/sec19-vahldiek-oberwagner_0.pdf) explicitly contemplates the need to protect against the less privileged code containing WRPKRU and XRSTOR instructions (though they do seem to have missed the implicit XRSTOR in sigreturn). > Thanks, > > Ingo - Kyle