Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 rebooting

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Tue, 17 Feb 2015 07:02:14 +0100

On 2015-02-16 21:40, Kashyap Chamarthy wrote:
> I can observe this only one of the Intel Xeon machines (which has 48
> CPUs and 1TB memory), but very reliably reproducible.
> 
> 
> Reproducer:
> 
>   - Just ensure physical host (L0) and guest hypervisor (L1) are running
>     3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
>     Preferably on an Intel Xeon machine - as that's where I could
>     reproduce this issue, not on a Haswell machine
>   - Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
>     your own preferred method to boot an L2 KVM guest).
>   - On a different terminal, which has serial console for L1: observe L1
>     reboot
> 
> 
> The only thing I notice in `demsg` (on L0) is this trace. _However_ this
> trace does not occur when an L1 reboot is triggered while you watch
> `dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
> means, the below trace is not the root cause of L1 being rebooted.  When
> the L2 gets rebooted, what you observe is just one of these messages
> "vcpu0 unhandled rdmsr: 0x1a6" below
> 
> . . .
> [Feb16 13:44] ------------[ cut here ]------------
> [  +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
> [  +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
> [  +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
> [  +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
> [  +0.007469]  0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
> [  +0.007542]  0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
> [  +0.007519]  ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
> [  +0.007489] Call Trace:
> [  +0.002471]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
> [  +0.005152]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
> [  +0.006020]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
> [  +0.005851]  [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
> [  +0.006974]  [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
> [  +0.006999]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> [  +0.007239]  [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
> [  +0.007136]  [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
> [  +0.007162]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> [  +0.007241]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
> [  +0.005864]  [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
> [  +0.005761]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
> [  +0.006979]  [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
> [  +0.006634]  [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
> [  +0.006197]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
> [  +0.006026]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
> [  +0.006537]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
> [  +0.005930]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
> [  +0.006365]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
> [  +0.005496]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
> [  +0.005065]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
> [  +0.006014] ---[ end trace 2f24e0820b44f686 ]---
> [  +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> [  +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> [Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> [  +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> . . .
> 
> 
> Version
> -------
> 
> Exact below versions were used on L0 and L1:
> 
>   $ uname -r; rpm -q qemu-system-x86
>   3.20.0-0.rc0.git5.1.fc23.x86_64
>   qemu-system-x86-2.2.0-5.fc22.x86_64
> 
> 
> 
> Other info
> ----------
> 
> - Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
>   this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
>   code:
> 
>    [. . .]
>    9178  * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
>    9179  * and modify vmcs12 to make it see what it would expect to see there if
>    9180  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
>    9181  */
>    9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>    9183                               u32 exit_intr_info,
>    9184                               unsigned long exit_qualification)
>    9185 {
>    9186         struct vcpu_vmx *vmx = to_vmx(vcpu);
>    9187         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>    9188 
>    9189         /* trying to cancel vmlaunch/vmresume is a bug */
>    9190         WARN_ON_ONCE(vmx->nested.nested_run_pending);
>    9191 
>    9192         leave_guest_mode(vcpu);
>    9193         prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
>    9194                        exit_qualification);
>    9195 
>    9196         vmx_load_vmcs01(vcpu);
>    9197 
>    9198         if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>    9199             && nested_exit_intr_ack_set(vcpu)) {
>    9200                 int irq = kvm_cpu_get_interrupt(vcpu);
>    9201                 WARN_ON(irq < 0);
>    9202                 vmcs12->vm_exit_intr_info = irq |
>    9203                         INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
>    9204         }
> 
> 
> - The above line 9190 was introduced in this commt:
> 
>   $ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
>       -- ./arch/x86/kvm/vmx.c
>   commit 5f3d5799974b89100268ba813cec8db7bd0693fb
>   Author: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>   Date:   Sun Apr 14 12:12:46 2013 +0200
>   
>       KVM: nVMX: Rework event injection and recovery
>       
>       The basic idea is to always transfer the pending event injection on
>       vmexit into the architectural state of the VCPU and then drop it from
>       there if it turns out that we left L2 to enter L1, i.e. if we enter
>       prepare_vmcs12.
>       
>       vmcs12_save_pending_events takes care to transfer pending L0 events into
>       the queue of L1. That is mandatory as L1 may decide to switch the guest
>       state completely, invalidating or preserving the pending events for
>       later injection (including on a different node, once we support
>       migration).
>       
>       This concept is based on the rule that a pending vmlaunch/vmresume is
>       not canceled. Otherwise, we would risk to lose injected events or leak
>       them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
>       entry of nested_vmx_vmexit.
>       
>       Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
>       Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>
> 
> 
> - `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here
> 
>     https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/
> 

Does enable_apicv make a difference?

Is this a regression caused by the commit, or do you only see it with
very recent kvm.git?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*

Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 rebooting