On Sat, Mar 12, 2016 at 10:08:49AM -0800, Andy Lutomirski wrote: > This demotes an OOPS and likely panic due to a failed non-"safe" MSR > access to a WARN_ONCE and, for RDMSR, a return value of zero. If > panic_on_oops is set, then failed unsafe MSR accesses will still > oops and panic. > > To be clear, this type of failure should *not* happen. This patch > exists to minimize the chance of nasty undebuggable failures due on > systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug. > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > --- > arch/x86/include/asm/msr.h | 10 ++++++++-- > arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++++++++++ > 2 files changed, 41 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h > index 93fb7c1cffda..1487054a1a70 100644 > --- a/arch/x86/include/asm/msr.h > +++ b/arch/x86/include/asm/msr.h > @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned int msr) > { > DECLARE_ARGS(val, low, high); > > - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr)); > + asm volatile("1: rdmsr\n" > + "2:\n" > + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe) > + : EAX_EDX_RET(val, low, high) : "c" (msr)); > if (msr_tracepoint_active(__tracepoint_read_msr)) > do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0); > return EAX_EDX_VAL(val, low, high); > @@ -119,7 +122,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr, > static inline void native_write_msr(unsigned int msr, > unsigned low, unsigned high) > { > - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory"); > + asm volatile("1: wrmsr\n" > + "2:\n" > + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) This might be a good idea: [ 0.220066] cpuidle: using governor menu [ 0.224000] ------------[ cut here ]------------ [ 0.224000] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:74 ex_handler_wrmsr_unsafe+0x73/0x80() [ 0.224000] unchecked MSR access error: WRMSR to 0xdeadbeef (tried to write 0x000000000000caca) [ 0.224000] Modules linked in: [ 0.224000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc7+ #7 [ 0.224000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 0.224000] 0000000000000000 ffff88007c0d7c08 ffffffff812f13a3 ffff88007c0d7c50 [ 0.224000] ffffffff81a40ffe ffff88007c0d7c40 ffffffff8105c3b1 ffffffff81717710 [ 0.224000] ffff88007c0d7d18 0000000000000000 ffffffff816207d0 0000000000000000 [ 0.224000] Call Trace: [ 0.224000] [<ffffffff812f13a3>] dump_stack+0x67/0x94 [ 0.224000] [<ffffffff8105c3b1>] warn_slowpath_common+0x91/0xd0 [ 0.224000] [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40 [ 0.224000] [<ffffffff8105c43c>] warn_slowpath_fmt+0x4c/0x50 [ 0.224000] [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40 [ 0.224000] [<ffffffff8131de53>] ? __this_cpu_preempt_check+0x13/0x20 [ 0.224000] [<ffffffff8104efe3>] ex_handler_wrmsr_unsafe+0x73/0x80 and it looks helpful and all but when you do it pretty early - for example I added a wrmsrl(0xdeadbeef, 0xcafe); at the end of pat_bsp_init() and the machine explodes with an early panic. I'm wondering what is better - early panic or an early #GP from a missing MSR. And more specifically, can we do better to handle the early case gracefully too? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html