On Tue, May 11, 2021 at 01:05:02PM -0400, Jon Kohler wrote: > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h > index 8d33ad80704f..5bc4df3a4c27 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -583,7 +583,13 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) > if (pk) > pkru_val = pk->pkru; > } > - __write_pkru(pkru_val); > + > + /* > + * WRPKRU is relatively expensive compared to RDPKRU. > + * Avoid WRPKRU when it would not change the value. > + */ > + if (pkru_val != rdpkru()) > + wrpkru(pkru_val); Just wondering; why aren't we having that in a per-cpu variable? The usual per-cpu MSR shadow approach avoids issuing any 'special' ops entirely.