On Mon, Mar 14, 2016 at 5:02 AM, Borislav Petkov <bp@xxxxxxxxx> wrote: > On Sat, Mar 12, 2016 at 10:08:49AM -0800, Andy Lutomirski wrote: >> This demotes an OOPS and likely panic due to a failed non-"safe" MSR >> access to a WARN_ONCE and, for RDMSR, a return value of zero. If >> panic_on_oops is set, then failed unsafe MSR accesses will still >> oops and panic. >> >> To be clear, this type of failure should *not* happen. This patch >> exists to minimize the chance of nasty undebuggable failures due on >> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug. >> >> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> >> --- >> arch/x86/include/asm/msr.h | 10 ++++++++-- >> arch/x86/mm/extable.c | 33 +++++++++++++++++++++++++++++++++ >> 2 files changed, 41 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h >> index 93fb7c1cffda..1487054a1a70 100644 >> --- a/arch/x86/include/asm/msr.h >> +++ b/arch/x86/include/asm/msr.h >> @@ -92,7 +92,10 @@ static inline unsigned long long native_read_msr(unsigned int msr) >> { >> DECLARE_ARGS(val, low, high); >> >> - asm volatile("rdmsr" : EAX_EDX_RET(val, low, high) : "c" (msr)); >> + asm volatile("1: rdmsr\n" >> + "2:\n" >> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe) >> + : EAX_EDX_RET(val, low, high) : "c" (msr)); >> if (msr_tracepoint_active(__tracepoint_read_msr)) >> do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), 0); >> return EAX_EDX_VAL(val, low, high); >> @@ -119,7 +122,10 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr, >> static inline void native_write_msr(unsigned int msr, >> unsigned low, unsigned high) >> { >> - asm volatile("wrmsr" : : "c" (msr), "a"(low), "d" (high) : "memory"); >> + asm volatile("1: wrmsr\n" >> + "2:\n" >> + _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) > > This might be a good idea: > > [ 0.220066] cpuidle: using governor menu > [ 0.224000] ------------[ cut here ]------------ > [ 0.224000] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:74 ex_handler_wrmsr_unsafe+0x73/0x80() > [ 0.224000] unchecked MSR access error: WRMSR to 0xdeadbeef (tried to write 0x000000000000caca) > [ 0.224000] Modules linked in: > [ 0.224000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc7+ #7 > [ 0.224000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [ 0.224000] 0000000000000000 ffff88007c0d7c08 ffffffff812f13a3 ffff88007c0d7c50 > [ 0.224000] ffffffff81a40ffe ffff88007c0d7c40 ffffffff8105c3b1 ffffffff81717710 > [ 0.224000] ffff88007c0d7d18 0000000000000000 ffffffff816207d0 0000000000000000 > [ 0.224000] Call Trace: > [ 0.224000] [<ffffffff812f13a3>] dump_stack+0x67/0x94 > [ 0.224000] [<ffffffff8105c3b1>] warn_slowpath_common+0x91/0xd0 > [ 0.224000] [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40 > [ 0.224000] [<ffffffff8105c43c>] warn_slowpath_fmt+0x4c/0x50 > [ 0.224000] [<ffffffff816207d0>] ? amd_cpu_notify+0x40/0x40 > [ 0.224000] [<ffffffff8131de53>] ? __this_cpu_preempt_check+0x13/0x20 > [ 0.224000] [<ffffffff8104efe3>] ex_handler_wrmsr_unsafe+0x73/0x80 > > and it looks helpful and all but when you do it pretty early - for > example I added a > > wrmsrl(0xdeadbeef, 0xcafe); > > at the end of pat_bsp_init() and the machine explodes with an early > panic. I'm wondering what is better - early panic or an early #GP from a > missing MSR. You're hitting: /* special handling not supported during early boot */ if (handler != ex_handler_default) return 0; which means that the behavior with and without my series applied is identical, for better or for worse. > > And more specifically, can we do better to handle the early case > gracefully too? We could probably remove that check and let custom fixups run early. I don't see any compelling reason to keep them disabled. That should probably be a separate change, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html