Hello Sean, On Tue, 31 Oct 2023 at 17:45, Prasad Pandit <ppandit@xxxxxxxxxx> wrote: > On Mon, 30 Oct 2023 at 20:41, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >>> - kvm_make_request(KVM_REQ_NMI, vcpu); >>> + if (events->nmi.pending) >>> + kvm_make_request(KVM_REQ_NMI, vcpu); > > > > This looks sane, but it should be unnecessary as KVM_REQ_NMI nmi_queued=0 should > > be a (costly) nop. Hrm, unless the vCPU is in HLT, in which case KVM will treat > > a spurious KVM_REQ_NMI as a wake event. When I made this change, my assumption > > was that userspace would set KVM_VCPUEVENT_VALID_NMI_PENDING iff there was > > relevant information to process. But if I'm reading the code correctly, QEMU > > invokes KVM_SET_VCPU_EVENTS with KVM_VCPUEVENT_VALID_NMI_PENDING at the end of > > machine creation. > > QEMU: qemu_thread_start kvm_start_vcpu_thread kvm_vcpu_thread_fn kvm_cpu_exec kvm_arch_put_registers kvm_put_vcpu_events (cpu=..., level=1) qemu_thread_start (args=0x559fdc852110) at ../util/qemu-thread-posix.c:534 kvm_vcpu_thread_fn (arg=0x559fdc84cdc0) at ../accel/kvm/kvm-accel-ops.c:56 qemu_wait_io_event (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:435 qemu_wait_io_event_common (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:411 process_queued_cpu_work (cpu=0x559fdc84cdc0) at ../cpus-common.c:351 do_kvm_cpu_synchronize_post_reset (cpu=0x559fdc84cdc0, arg=...) at ../accel/kvm/kvm-all.c:2808 kvm_arch_put_registers (cpu=0x559fdc84cdc0, level=2) at ../target/i386/kvm/kvm.c:4664 kvm_put_vcpu_events (cpu=0x559fdc84cdc0, level=2) at ../target/i386/kvm/kvm.c:4308 qemu_thread_start (args=0x559fdc852110) at ../util/qemu-thread-posix.c:534 kvm_vcpu_thread_fn (arg=0x559fdc84cdc0) at ../accel/kvm/kvm-accel-ops.c:56 qemu_wait_io_event (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:435 qemu_wait_io_event_common (cpu=0x559fdc84cdc0) at ../softmmu/cpus.c:411 process_queued_cpu_work (cpu=0x559fdc84cdc0) at ../cpus-common.c:351 do_kvm_cpu_synchronize_post_init (cpu=0x559fdc84cdc0, arg=...) at ../accel/kvm/kvm-all.c:2819 kvm_arch_put_registers (cpu=0x559fdc84cdc0, level=3) at ../target/i386/kvm/kvm.c:4664 kvm_put_vcpu_events (cpu=0x559fdc84cdc0, level=3) at ../target/i386/kvm/kvm.c:4308 Kernel: kvm_vcpu_ioctl mutex_lock_killable(&vcpu->mutex) kvm_arch_vcpu_ioctl(, KVM_SET_VCPU_EVENTS, ... ) mutex_unlock(&vcpu->mutex); -> kvm_vcpu_ioctl_x86_set_vcpu_events() * Above are 3 different ways in which KVM_SET_VCPU_EVENTS ioctl(2) gets called. QEMU/target/i386/kvm/kvm.c: kvm_put_vcpu_events() if (level >= KVM_PUT_RESET_STATE) { events.flags |= KVM_VCPUEVENT_VALID_NMI_PENDING; But KVM_VCPUEVENT_VALID_NMI_PENDING is set only when level >= 2(KVM_PUT_RESET_STATE). ie. in the first (level=1) case _NMI_PENDING is not set. * In the real-time host set-up I have, KVM_VCPUEVENT_VALID_NMI_PENDING is called twice for each VCPU and after that kernel goes into what looks like a lock contention loop. Each time KVM_VCPUEVENT_VALID_NMI_PENDING is called with 'cpu->env->nmi_injected = 0' and 'cpu->env->nmi_pending = 0'. ie. for each VCPU two NMI events are injected via - kvm_make_request(KVM_REQ_NMI, vcpu), when vcpu has no NMIs pending. # perf lock report -t Name acquired contended avg wait total wait max wait min wait CPU 3/KVM 154017 154017 62.19 us 9.58 s 101.01 us 1.49 us CPU 9/KVM 152796 152796 62.67 us 9.58 s 95.92 us 1.49 us CPU 7/KVM 151554 151554 63.16 us 9.57 s 102.70 us 1.48 us CPU 1/KVM 151273 151273 65.30 us 9.88 s 98.88 us 1.52 us CPU 6/KVM 151107 151107 63.34 us 9.57 s 107.64 us 1.50 us CPU 8/KVM 151038 151038 63.37 us 9.57 s 102.93 us 1.51 us CPU 2/KVM 150701 150701 63.52 us 9.57 s 99.24 us 1.50 us CPU 5/KVM 150695 150695 63.56 us 9.58 s 142.15 us 1.50 us CPU 4/KVM 150527 150527 63.60 us 9.57 s 102.04 us 1.44 us qemu-system-x86 665 665 65.92 us 43.84 ms 100.67 us 1.55 us CPU 0/KVM 2 2 210.46 us 420.92 us 411.89 us 9.03 us qemu-system-x86 1 1 404.91 us 404.91 us 404.91 us 404.91 us TC tc-pc.ram 1 1 414.22 us 414.22 us 414.22 us 414.22 us === output for debug=== bad: 10, total: 13 bad rate: 76.92 % histogram of events caused bad sequence acquire: 0 acquired: 10 contended: 0 release: 0 * VCPU#0 thread seems to wait indefinitely to get qemu_mutex_iothread_lock() to make any progress. The proposed patch above to check 'events->nmi_pending' for non-zero value helps to fix this issue. ...wdyt? Thank you. --- - Prasad PS: The kvm_make_request() routine has following comment, I wonder if this is what is happening with empty NMI events. Request that don't require vCPU action should never be logged in vcpu->requests. The vCPU won't clear the request, so it will stay logged indefinitely and prevent the vCPU from entering the guest.