[PATCH 0/2] KVM: VMX: fix race condition in VT-d PI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In one of our recent test which transferring data via a passthrough
NIC with VT-d PI enabled, we encountered the following call trace on
KVM commit 5f54c8b2d4fa:

[30521.205623] WARNING: CPU: 30 PID: 5713 at arch/x86/kvm/vmx.c:5094 vmx_deliver_posted_interrupt+0xda/0xf0 [kvm_intel]
[30521.353638] CPU: 30 PID: 5713 Comm: CPU 10/KVM Not tainted 4.13.0-rc6+ #1
[30521.372155] task: ffff92f3a689c000 task.stack: ffffbd3fb1f9c000
[30521.378757] RIP: 0010:vmx_deliver_posted_interrupt+0xda/0xf0 [kvm_intel]
[30521.386228] RSP: 0018:ffffbd3fb1f9f9c8 EFLAGS: 00010202
[30521.392055] RAX: 00006a0000f20002 RBX: ffff92f30fcf8000 RCX: 000000000000001d
[30521.400010] RDX: 0000000000000070 RSI: 00000000000000fd RDI: ffff92f30fcf8000
[30521.407964] RBP: ffffbd3fb1f9f9d0 R08: 0000000000000000 R09: 0000000000000000
[30521.415919] R10: 0000000000000000 R11: 0000000000000008 R12: ffffbd3fb1f9fab4
[30521.423878] R13: ffff92f323d6c258 R14: ffff92f30fcf8000 R15: ffff92f321f80900
[30521.431834] FS:  00007f1834abd700(0000) GS:ffff92f3c8f80000(0000) knlGS:0000000000000000
[30521.440854] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30521.447261] CR2: 00007fd5e9f20710 CR3: 0000017727f07000 CR4: 00000000007426e0
[30521.455217] PKRU: 55555554
[30521.458237] Call Trace:
[30521.461104]  __apic_accept_irq+0x26e/0x320 [kvm]
[30521.466263]  kvm_irq_delivery_to_apic_fast+0x108/0x3e0 [kvm]
[30521.472589]  ? smp_call_function_single+0xbf/0xf0
[30521.477846]  kvm_irq_delivery_to_apic+0x63/0x2b0 [kvm]
[30521.483577]  ? __update_load_avg_se.isra.28+0x142/0x160
[30521.489397]  ? __update_load_avg_se.isra.28+0x142/0x160
[30521.495224]  kvm_lapic_reg_write+0x11f/0x650 [kvm]
[30521.500576]  kvm_x2apic_msr_write+0x54/0x80 [kvm]
[30521.505833]  kvm_set_msr_common+0x3ce/0xc50 [kvm]
[30521.511080]  ? __pi_post_block+0x13d/0x1c0 [kvm_intel]
[30521.516802]  vmx_set_msr+0xb5/0x860 [kvm_intel]
[30521.521860]  kvm_set_msr+0x71/0x100 [kvm]
[30521.526334]  handle_wrmsr+0x58/0x150 [kvm_intel]
[30521.531485]  vmx_handle_exit+0xa4/0x1490 [kvm_intel]
[30521.537024]  ? vmx_vcpu_run+0x31b/0x4d0 [kvm_intel]
[30521.542474]  kvm_arch_vcpu_ioctl_run+0xa38/0x1610 [kvm]
[30521.548299]  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
[30521.553652]  kvm_vcpu_ioctl+0x33a/0x600 [kvm]
[30521.558511]  ? kvm_vcpu_ioctl+0x33a/0x600 [kvm]
[30521.563568]  ? eventfd_read+0x4b/0x90
[30521.567657]  do_vfs_ioctl+0xa1/0x5d0
[30521.571641]  ? SyS_futex+0x7f/0x180
[30521.575529]  SyS_ioctl+0x79/0x90
[30521.579134]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[30521.584278] RIP: 0033:0x7f1849347787
[30521.588261] RSP: 002b:00007f1834aba5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[30521.596695] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1849347787
[30521.604643] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001e
[30521.612594] RBP: 0000000000000000 R08: 00005637a53936d0 R09: 00000000ffffffff
[30521.620542] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[30521.628491] R13: 0000000000000000 R14: 00007f186701a000 R15: 00005637a88fc000
[30521.636441] Code: c2 c1 e8 06 83 e2 3f 48 c1 e0 03 48 c1 e2 0a 48 8d ba e0 1c a1 88 48 29 c7 48 8b 05 01 29 e9 c3 ff 90 a8 00 00 00 e9 59 ff ff ff <0f> ff eb c8 0f ff eb 88 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00


After investigation, we found warnings in kvm_vcpu_trigger_posted_interrupt()
(called in vmx_deliver_posted_interrupt())

        if (vcpu->mode == IN_GUEST_MODE) {
                ...
                 * If the vcpu is in guest mode, it means it is
                 * running instead of being scheduled out and
                 * waiting in the run queue, and that's the only
                 * case when 'SN' is set currently, warning if
                 * 'SN' is set.
                 */
                WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
                ...
        }

as well as in pi_pre_block()

        do {
                WARN((pi_desc->sn == 1),
                     "Warning: SN field of posted-interrupts "
                     "is set before blocking\n");
                     
                ...
        } while (!pi_try_cmpxchg_control(pi_desc, &old, &new));

assumes that if a vCPU is in guest mode, the SN field of its VT-d PI
descriptor should not be set. However, vmx_update_pi_irte() changes
the SN field without checking the mode of the corresponding vCPU,
which may break the above assumption and trigger the above call trace.

Patch 1 removes the unnecessary changes of the SN field in
vmx_update_pi_irte() to avoid the race condition.

Even though patch 1 removes the raced change, WARN_ON_ONCE() in
vmx_deliver_posted_interrupt() is still too strong and can be
triggered in allowed cases. Patch 2 documents the those cases and
removes that WARN_ON_ONCE().


Haozhong Zhang (2):
  KVM: VMX: do not change SN bit in vmx_update_pi_irte()
  KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt

 arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

-- 
2.11.0




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]