2017-07-25 18:55 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>: > On 25/07/2017 12:40, Wanpeng Li wrote: >> Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS) >> tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough: >> >> - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the >> L2 can generate #PF which can be intercepted by L0. >> - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1 >> guest and injects the #PF into L1. >> - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field >> and fix the shadow page table before resuming L2 guest. >> - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0 >> - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility >> state field which is updated in vmcs12 when prepare vmcs02 >> - ......... > > The "..." part is not very enlightening. My understanding is: > > - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault > on IRET, so the L2 can generate #PF which can be intercepted by L0. > - L0 walks L1's guest page table and sees the mapping is invalid, it > resumes the L1 guest and injects the #PF into L1. At this point the > vmcs02 has nmi_known_unmasked=true. > - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field > of vmcs12 (and fixes the shadow page table) before resuming L2 guest. > - L1 executes VMRESUME to resume L2, causing a vmexit to L0 > - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the > interruptibility-state field of vmcs02, but nmi_known_unmasked is > still true. > - on the next L2 exit to L0, nmi_known_unmasked is true so > vmx_recover_nmi_blocking does not do anything. Thanks for that. :) > > Can you explain instead what happens if your v1 patch is applied (on top of mine), > and why it fixes the bug. We will set the expected guest interruptibility-state field before the final step: L0 fixes the shadow page table (NGVA -> HPA), then L0 resumes the guest w/ the expected guest interruptibility-state. > > The patch is correct and almost obvious, but I'd like the commit message to be precise. > > (Also, does your machine have shadow VMCS support?) A Haswell desktop w/ shadow vmcs enabled. Regards, Wanpeng Li