Hi, please add the upstream commit 338b522ca43cfd32d11a370f4203bcd089c6c877 ("perf/x86/intel: Protect LBR and extra_regs against KVM lying") to -stable. (mainly 3.14, but it affects any kernel from 3.12 to 3.15) This commit fixes a kernel crash that happens very reliably inside a Qemu guest, where the host has Intel CPU, and "-cpu host" is given to the command line. Relevant stack trace is like the following: (which was originally reported by my colleage Mohammed Gamal) ==== CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.12.26-1-pserver #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88013aa78000 ti: ffff88013aa54000 task.ti: ffff88013aa54000 RIP: 0010:[<ffffffff81ce444d>] [<ffffffff81ce444d>] intel_pmu_init+0x2f1/0x921 RSP: 0000:ffff88013aa55e28 EFLAGS: 00000202 RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000345 RDX: 0000000000000003 RSI: 0000000000000730 RDI: 0000ffffffffffff RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000001 R11: ffffffff81cbb160 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88013ffff000 CR3: 0000000001c0b000 CR4: 00000000001406f0 Stack: ffffffff81cbdd60 ffffffff81ce3437 0000000000000001 ffffffff81ce3466 0000000000000000 ffffffff81cdf291 ffffffff81ce3437 0000000000000001 0000000000000000 0000000000000000 0000000000000000 ffffffff8100021a Call Trace: [<ffffffff81ce3437>] ? check_bugs+0x2e/0x2e [<ffffffff81ce3466>] ? init_hw_perf_events+0x2f/0x4e1 [<ffffffff81cdf291>] ? set_real_mode_permissions+0x93/0x9e [<ffffffff81ce3437>] ? check_bugs+0x2e/0x2e [<ffffffff8100021a>] ? do_one_initcall+0x4a/0x170 [<ffffffff8109f46f>] ? clockevents_register_device+0xdf/0x170 [<ffffffff81ce90a9>] ? native_smp_prepare_cpus+0x35d/0x389 [<ffffffff81cdc8a3>] ? kernel_init_freeable+0x95/0x1c6 [<ffffffff81709920>] ? rest_init+0x80/0x80 [<ffffffff81709929>] ? kernel_init+0x9/0xf0 [<ffffffff8171d27c>] ? ret_from_fork+0x7c/0xb0 [<ffffffff81709920>] ? rest_init+0x80/0x80 Code: 6d fd ff 44 89 0d bc 6d fd ff 89 0d 76 6e fd ff 7e 2b 83 e2 1f b8 03 00 00 00 b9 45 03 00 00 83 fa 02 0f 4f c2 89 05 83 6d fd ff <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 89 15 21 6e fd ff e8 1c 67 RIP [<ffffffff81ce444d>] intel_pmu_init+0x2f1/0x921 RSP <ffff88013aa55e28> ---[ end trace caccfda5c953b0c5 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ==== I'm already aware that this commit is longer than 100 lines, which is not ideal for -stable. However, without this commit, a guest kernel crashes every time. Given that it's actually a serious issue, please consider taking this patch. I'm not aware of any alternative, such as a simpler commit to fix the bug. Sidenote: I also tried to backport this commit to 3.10, but no luck. 3.10 kernel crashes no matter whether this fix is included or not. Thanks, Dongsu ==== >From 338b522ca43cfd32d11a370f4203bcd089c6c877 Mon Sep 17 00:00:00 2001 From: Kan Liang <kan.liang@xxxxxxxxx> Date: Mon, 14 Jul 2014 12:25:56 -0700 Subject: [PATCH] perf/x86/intel: Protect LBR and extra_regs against KVM lying With -cpu host, KVM reports LBR and extra_regs support, if the host has support. When the guest perf driver tries to access LBR or extra_regs MSR, it #GPs all MSR accesses,since KVM doesn't handle LBR and extra_regs support. So check the related MSRs access right once at initialization time to avoid the error access at runtime. For reproducing the issue, please build the kernel with CONFIG_KVM_INTEL = y (for host kernel). And CONFIG_PARAVIRT = n and CONFIG_KVM_GUEST = n (for guest kernel). Start the guest with -cpu host. Run perf record with --branch-any or --branch-filter in guest to trigger LBR Run perf stat offcore events (E.g. LLC-loads/LLC-load-misses ...) in guest to trigger offcore_rsp #GP Signed-off-by: Kan Liang <kan.liang@xxxxxxxxx> Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@xxxxxxxxx> Cc: Mark Davies <junk@xxxxxxxxxxx> Cc: Paul Mackerras <paulus@xxxxxxxxx> Cc: Stephane Eranian <eranian@xxxxxxxxxx> Cc: Yan, Zheng <zheng.z.yan@xxxxxxxxx> Link: http://lkml.kernel.org/r/1405365957-20202-1-git-send-email-kan.liang@xxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> --- arch/x86/kernel/cpu/perf_event.c | 3 ++ arch/x86/kernel/cpu/perf_event.h | 12 ++++--- arch/x86/kernel/cpu/perf_event_intel.c | 66 +++++++++++++++++++++++++++++++++- 3 files changed, 75 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 2bdfbff..2879ecd 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -118,6 +118,9 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event *event) continue; if (event->attr.config1 & ~er->valid_mask) return -EINVAL; + /* Check if the extra msrs can be safely accessed*/ + if (!er->extra_msr_access) + return -ENXIO; reg->idx = er->idx; reg->config = event->attr.config1; diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index 3b2f9bd..8ade931 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -295,14 +295,16 @@ struct extra_reg { u64 config_mask; u64 valid_mask; int idx; /* per_xxx->regs[] reg index */ + bool extra_msr_access; }; #define EVENT_EXTRA_REG(e, ms, m, vm, i) { \ - .event = (e), \ - .msr = (ms), \ - .config_mask = (m), \ - .valid_mask = (vm), \ - .idx = EXTRA_REG_##i, \ + .event = (e), \ + .msr = (ms), \ + .config_mask = (m), \ + .valid_mask = (vm), \ + .idx = EXTRA_REG_##i, \ + .extra_msr_access = true, \ } #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx) \ diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index c206815..2502d0d 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2182,6 +2182,41 @@ static void intel_snb_check_microcode(void) } } +/* + * Under certain circumstances, access certain MSR may cause #GP. + * The function tests if the input MSR can be safely accessed. + */ +static bool check_msr(unsigned long msr, u64 mask) +{ + u64 val_old, val_new, val_tmp; + + /* + * Read the current value, change it and read it back to see if it + * matches, this is needed to detect certain hardware emulators + * (qemu/kvm) that don't trap on the MSR access and always return 0s. + */ + if (rdmsrl_safe(msr, &val_old)) + return false; + + /* + * Only change the bits which can be updated by wrmsrl. + */ + val_tmp = val_old ^ mask; + if (wrmsrl_safe(msr, val_tmp) || + rdmsrl_safe(msr, &val_new)) + return false; + + if (val_new != val_tmp) + return false; + + /* Here it's sure that the MSR can be safely accessed. + * Restore the old value and return. + */ + wrmsrl(msr, val_old); + + return true; +} + static __init void intel_sandybridge_quirk(void) { x86_pmu.check_microcode = intel_snb_check_microcode; @@ -2271,7 +2306,8 @@ __init int intel_pmu_init(void) union cpuid10_ebx ebx; struct event_constraint *c; unsigned int unused; - int version; + struct extra_reg *er; + int version, i; if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) { switch (boot_cpu_data.x86) { @@ -2577,5 +2613,33 @@ __init int intel_pmu_init(void) } } + /* + * Access LBR MSR may cause #GP under certain circumstances. + * E.g. KVM doesn't support LBR MSR + * Check all LBT MSR here. + * Disable LBR access if any LBR MSRs can not be accessed. + */ + if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL)) + x86_pmu.lbr_nr = 0; + for (i = 0; i < x86_pmu.lbr_nr; i++) { + if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) && + check_msr(x86_pmu.lbr_to + i, 0xffffUL))) + x86_pmu.lbr_nr = 0; + } + + /* + * Access extra MSR may cause #GP under certain circumstances. + * E.g. KVM doesn't support offcore event + * Check all extra_regs here. + */ + if (x86_pmu.extra_regs) { + for (er = x86_pmu.extra_regs; er->msr; er++) { + er->extra_msr_access = check_msr(er->msr, 0x1ffUL); + /* Disable LBR select mapping */ + if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access) + x86_pmu.lbr_sel_map = NULL; + } + } + return 0; } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html