On 10/9/20 12:42 PM, ira.weiny@xxxxxxxxx wrote: > From: Ira Weiny <ira.weiny@xxxxxxxxx> > > The PKRS MSR is defined as a per-logical-processor register. This > isolates memory access by logical CPU. Unfortunately, the MSR is not > managed by XSAVE. Therefore, tasks must save/restore the MSR value on > context switch. > > Define a saved PKRS value in the task struct, as well as a cached > per-logical-processor MSR value which mirrors the MSR value of the > current CPU. Initialize all tasks with the default MSR value. Then, on > schedule in, check the saved task MSR vs the per-cpu value. If > different proceed to write the MSR. If not avoid the overhead of the > MSR write and continue. It's probably nice to note how the WRMSR is special here, in addition to the comments below. > #endif /*_ASM_X86_PKEYS_INTERNAL_H */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 97143d87994c..da2381136b2d 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -18,6 +18,7 @@ struct vm86; > #include <asm/cpufeatures.h> > #include <asm/page.h> > #include <asm/pgtable_types.h> > +#include <asm/pkeys_common.h> > #include <asm/percpu.h> > #include <asm/msr.h> > #include <asm/desc_defs.h> > @@ -542,6 +543,11 @@ struct thread_struct { > > unsigned int sig_on_uaccess_err:1; > > +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS > + /* Saved Protection key register for supervisor mappings */ > + u32 saved_pkrs; > +#endif Could you take a look around thread_struct and see if there are some other MSRs near which you can stash this? This seems like a bit of a lonely place. ... > void flush_thread(void) > { > struct task_struct *tsk = current; > @@ -195,6 +212,8 @@ void flush_thread(void) > memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); > > fpu__clear_all(&tsk->thread.fpu); > + > + pks_init_task(tsk); > } > > void disable_TSC(void) > @@ -644,6 +663,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) > > if ((tifp ^ tifn) & _TIF_SLD) > switch_to_sld(tifn); > + > + pks_sched_in(); > } > > /* > diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c > index 3cf8f775f36d..30f65dd3d0c5 100644 > --- a/arch/x86/mm/pkeys.c > +++ b/arch/x86/mm/pkeys.c > @@ -229,3 +229,31 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) > > return pk_reg; > } > + > +DEFINE_PER_CPU(u32, pkrs_cache); > + > +/** > + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not > + * serializing but still maintains ordering properties similar to WRPKRU. > + * The current SDM section on PKRS needs updating but should be the same as > + * that of WRPKRU. So to quote from the WRPKRU text: > + * > + * WRPKRU will never execute transiently. Memory accesses > + * affected by PKRU register will not execute (even transiently) > + * until all prior executions of WRPKRU have completed execution > + * and updated the PKRU register. > + */ > +void write_pkrs(u32 new_pkrs) > +{ > + u32 *pkrs; > + > + if (!static_cpu_has(X86_FEATURE_PKS)) > + return; > + > + pkrs = get_cpu_ptr(&pkrs_cache); > + if (*pkrs != new_pkrs) { > + *pkrs = new_pkrs; > + wrmsrl(MSR_IA32_PKRS, new_pkrs); > + } > + put_cpu_ptr(pkrs); > +} > It bugs me a *bit* that this is being called in a preempt-disabled region, but we still bother with the get/put_cpu jazz. Are there other future call-sites for this that aren't in preempt-disabled regions?