* Kyle Huey <me@xxxxxxxxxxxx> wrote: > Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. > When enabled, the processor will fault on attempts to execute the CPUID > instruction with CPL>0. Exposing this feature to userspace will allow a > ptracer to trap and emulate the CPUID instruction. > > When supported, this feature is controlled by toggling bit 0 of > MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of > https://bugzilla.kernel.org/attachment.cgi?id=243991 > > Implement a new pair of arch_prctls, available on both x86-32 and x86-64. > > ARCH_GET_CPUID: Returns the current CPUID faulting state, either > ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. arg2 must be 0. > > ARCH_SET_CPUID: Set the CPUID faulting state to arg2, which must be either > ARCH_CPUID_ENABLE or ARCH_CPUID_SIGSEGV. Returns EINVAL if arg2 is > another value or CPUID faulting is not supported on this system. So the interface is: > +#define ARCH_GET_CPUID 0x1005 > +#define ARCH_SET_CPUID 0x1006 > +#define ARCH_CPUID_ENABLE 1 > +#define ARCH_CPUID_SIGSEGV 2 Which maps to: prctl(ARCH_SET_CPUID, 0); /* -EINVAL */ prctl(ARCH_SET_CPUID, 1); /* enable CPUID [i.e. make it work without faulting] */ prctl(ARCH_SET_CPUID, 2); /* disable CPUID [i.e. make it fault] */ ret = prctl(ARCH_GET_CPUID, 0); /* return current state: 1==on, 2==off */ This is a very broken interface that makes very little sense. It would be much better to use a more natural interface where 1/0 means on/off and where ARCH_GET_CPUID returns the current natural state: prctl(ARCH_SET_CPUID, 0); /* disable CPUID [i.e. make it fault] */ prctl(ARCH_SET_CPUID, 1); /* enable CPUID [i.e. make it work without faulting] */ ret = prctl(ARCH_GET_CPUID); /* 1==enabled, 0==disabled */ See how natural it is? The use of the ARCH_CPUID_SIGSEGV/ENABLED symbols can be avoided altogether. This will cut down on some of the ugliness in the kernel code as well - and clean up the argument name as well: instead of naming it 'int arg2' it can be named the more natural 'int cpuid_enabled'. > The state of the CPUID faulting flag is propagated across forks, but reset > upon exec. I don't think this is the natural API for propagating settings across exec(). We should reset the flag on exec() only if security considerations require it - i.e. like perf events are cleared. If binaries that assume a working CPUID are exec()-ed then CPUID can be enabled explicitly. Clearing it automatically loses the ability of a pure no-CPUID environment to exec() a CPUID-safe binary. > Signed-off-by: Kyle Huey <khuey@xxxxxxxxxxxx> > --- > arch/x86/include/asm/msr-index.h | 3 + > arch/x86/include/asm/processor.h | 2 + > arch/x86/include/asm/thread_info.h | 6 +- > arch/x86/include/uapi/asm/prctl.h | 6 + > arch/x86/kernel/cpu/intel.c | 7 + > arch/x86/kernel/process.c | 84 ++++++++++ > fs/exec.c | 1 + > include/linux/thread_info.h | 4 + > tools/testing/selftests/x86/Makefile | 2 +- > tools/testing/selftests/x86/cpuid-fault.c | 254 ++++++++++++++++++++++++++++++ > 10 files changed, 367 insertions(+), 2 deletions(-) > create mode 100644 tools/testing/selftests/x86/cpuid-fault.c Please put the self-test into a separate patch. > static void init_intel_misc_features_enables(struct cpuinfo_x86 *c) > { > u64 msr; > > + if (rdmsrl_safe(MSR_MISC_FEATURES_ENABLES, &msr)) > + return; > + > + msr = 0; > + wrmsrl(MSR_MISC_FEATURES_ENABLES, msr); > + this_cpu_write(msr_misc_features_enables_shadow, msr); > + > if (!rdmsrl_safe(MSR_PLATFORM_INFO, &msr)) { > if (msr & MSR_PLATFORM_INFO_CPUID_FAULT) > set_cpu_cap(c, X86_FEATURE_CPUID_FAULT); > } > } Sigh, so the Intel MSR index itself is grossly misnamed: MSR_MISC_FEATURES_ENABLES - plain reading of 'enables' suggests it's a verb, but in wants to be a noun. A better name would be MSR_MISC_FEATURES or so. So while for the MSR index we want to keep the Intel name, please drop that _enables() postfix from the kernel C function names such as this one - and from the shadow value name as well. > +DEFINE_PER_CPU(u64, msr_misc_features_enables_shadow); > + > +static void set_cpuid_faulting(bool on) > +{ > + u64 msrval; > + > + DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > + > + msrval = this_cpu_read(msr_misc_features_enables_shadow); > + msrval &= ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT; > + msrval |= (on << MSR_MISC_FEATURES_ENABLES_CPUID_FAULT_BIT); > + this_cpu_write(msr_misc_features_enables_shadow, msrval); > + wrmsrl(MSR_MISC_FEATURES_ENABLES, msrval); This gets called from the context switch path and this looks pretty suboptimal, especially when combined with the TIF flag check: > void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, > struct tss_struct *tss) > { > struct thread_struct *prev, *next; > > prev = &prev_p->thread; > next = &next_p->thread; > > @@ -206,16 +278,21 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, > > debugctl &= ~DEBUGCTLMSR_BTF; > if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) > debugctl |= DEBUGCTLMSR_BTF; > > update_debugctlmsr(debugctl); > } > > + if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^ > + test_tsk_thread_flag(next_p, TIF_NOCPUID)) { > + set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID)); > + } > + Why not cache the required MSR value in the task struct instead? That would allow something much more obvious and much faster, like: if (prev_p->thread.misc_features_val != next_p->thread.misc_features_val) wrmsrl(MSR_MISC_FEATURES_ENABLES, next_p->thread.misc_features_val); (The TIF flag maintenance is still required to get into __switch_to_xtra().) It would also be easy to extend without extra overhead, should any other feature bit be added to the MSR in the future. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html