On 18/09/2017 03:56, Haozhong Zhang wrote: > In one of our recent test which transferring data via a passthrough > NIC with VT-d PI enabled, we encountered the following call trace on > KVM commit 5f54c8b2d4fa: > > [30521.205623] WARNING: CPU: 30 PID: 5713 at arch/x86/kvm/vmx.c:5094 vmx_deliver_posted_interrupt+0xda/0xf0 [kvm_intel] > [30521.353638] CPU: 30 PID: 5713 Comm: CPU 10/KVM Not tainted 4.13.0-rc6+ #1 > [30521.372155] task: ffff92f3a689c000 task.stack: ffffbd3fb1f9c000 > [30521.378757] RIP: 0010:vmx_deliver_posted_interrupt+0xda/0xf0 [kvm_intel] > [30521.386228] RSP: 0018:ffffbd3fb1f9f9c8 EFLAGS: 00010202 > [30521.392055] RAX: 00006a0000f20002 RBX: ffff92f30fcf8000 RCX: 000000000000001d > [30521.400010] RDX: 0000000000000070 RSI: 00000000000000fd RDI: ffff92f30fcf8000 > [30521.407964] RBP: ffffbd3fb1f9f9d0 R08: 0000000000000000 R09: 0000000000000000 > [30521.415919] R10: 0000000000000000 R11: 0000000000000008 R12: ffffbd3fb1f9fab4 > [30521.423878] R13: ffff92f323d6c258 R14: ffff92f30fcf8000 R15: ffff92f321f80900 > [30521.431834] FS: 00007f1834abd700(0000) GS:ffff92f3c8f80000(0000) knlGS:0000000000000000 > [30521.440854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [30521.447261] CR2: 00007fd5e9f20710 CR3: 0000017727f07000 CR4: 00000000007426e0 > [30521.455217] PKRU: 55555554 > [30521.458237] Call Trace: > [30521.461104] __apic_accept_irq+0x26e/0x320 [kvm] > [30521.466263] kvm_irq_delivery_to_apic_fast+0x108/0x3e0 [kvm] > [30521.472589] ? smp_call_function_single+0xbf/0xf0 > [30521.477846] kvm_irq_delivery_to_apic+0x63/0x2b0 [kvm] > [30521.483577] ? __update_load_avg_se.isra.28+0x142/0x160 > [30521.489397] ? __update_load_avg_se.isra.28+0x142/0x160 > [30521.495224] kvm_lapic_reg_write+0x11f/0x650 [kvm] > [30521.500576] kvm_x2apic_msr_write+0x54/0x80 [kvm] > [30521.505833] kvm_set_msr_common+0x3ce/0xc50 [kvm] > [30521.511080] ? __pi_post_block+0x13d/0x1c0 [kvm_intel] > [30521.516802] vmx_set_msr+0xb5/0x860 [kvm_intel] > [30521.521860] kvm_set_msr+0x71/0x100 [kvm] > [30521.526334] handle_wrmsr+0x58/0x150 [kvm_intel] > [30521.531485] vmx_handle_exit+0xa4/0x1490 [kvm_intel] > [30521.537024] ? vmx_vcpu_run+0x31b/0x4d0 [kvm_intel] > [30521.542474] kvm_arch_vcpu_ioctl_run+0xa38/0x1610 [kvm] > [30521.548299] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] > [30521.553652] kvm_vcpu_ioctl+0x33a/0x600 [kvm] > [30521.558511] ? kvm_vcpu_ioctl+0x33a/0x600 [kvm] > [30521.563568] ? eventfd_read+0x4b/0x90 > [30521.567657] do_vfs_ioctl+0xa1/0x5d0 > [30521.571641] ? SyS_futex+0x7f/0x180 > [30521.575529] SyS_ioctl+0x79/0x90 > [30521.579134] entry_SYSCALL_64_fastpath+0x1a/0xa5 > [30521.584278] RIP: 0033:0x7f1849347787 > [30521.588261] RSP: 002b:00007f1834aba5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [30521.596695] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1849347787 > [30521.604643] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001e > [30521.612594] RBP: 0000000000000000 R08: 00005637a53936d0 R09: 00000000ffffffff > [30521.620542] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > [30521.628491] R13: 0000000000000000 R14: 00007f186701a000 R15: 00005637a88fc000 > [30521.636441] Code: c2 c1 e8 06 83 e2 3f 48 c1 e0 03 48 c1 e2 0a 48 8d ba e0 1c a1 88 48 29 c7 48 8b 05 01 29 e9 c3 ff 90 a8 00 00 00 e9 59 ff ff ff <0f> ff eb c8 0f ff eb 88 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 > > > After investigation, we found warnings in kvm_vcpu_trigger_posted_interrupt() > (called in vmx_deliver_posted_interrupt()) > > if (vcpu->mode == IN_GUEST_MODE) { > ... > * If the vcpu is in guest mode, it means it is > * running instead of being scheduled out and > * waiting in the run queue, and that's the only > * case when 'SN' is set currently, warning if > * 'SN' is set. > */ > WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)); > ... > } > > as well as in pi_pre_block() > > do { > WARN((pi_desc->sn == 1), > "Warning: SN field of posted-interrupts " > "is set before blocking\n"); > > ... > } while (!pi_try_cmpxchg_control(pi_desc, &old, &new)); > > assumes that if a vCPU is in guest mode, the SN field of its VT-d PI > descriptor should not be set. However, vmx_update_pi_irte() changes > the SN field without checking the mode of the corresponding vCPU, > which may break the above assumption and trigger the above call trace. > > Patch 1 removes the unnecessary changes of the SN field in > vmx_update_pi_irte() to avoid the race condition. > > Even though patch 1 removes the raced change, WARN_ON_ONCE() in > vmx_deliver_posted_interrupt() is still too strong and can be > triggered in allowed cases. Patch 2 documents the those cases and > removes that WARN_ON_ONCE(). > > > Haozhong Zhang (2): > KVM: VMX: do not change SN bit in vmx_update_pi_irte() > KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt > > arch/x86/kvm/vmx.c | 39 ++++++++++++++++++++++----------------- > 1 file changed, 22 insertions(+), 17 deletions(-) > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>