On Wed, Feb 24, 2021, Krish Sadhukhan wrote: > > On 2/23/21 2:42 PM, Sean Christopherson wrote: > > On Tue, Feb 23, 2021, Krish Sadhukhan wrote: > > > Currently, svm_vcpu_run() advances the RIP following VMRUN completion when > > > control returns to host. This works fine if there is no trap flag set > > > on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if > > > VMRUN is single-stepped, this advancement of the RIP leads to an incorrect > > > RIP in the #DB handler invoked for the single-step trap. Therefore, check Whose #DB handler? L1's? > > > if the VMRUN instruction is single-stepped and if so, do not advance the RIP > > > when the #DB intercept #VMEXIT happens. > > This really needs to clarify which VMRUN, i.e. L0 vs. L1. AFAICT, you're > > talking about both at separate times. Is this an issue with L1 single-stepping > > its VMRUN, L0 single-stepping its VMRUN, L0 single-stepping L1's VMRUN, ??? > > > The issue is seen when L1 single-steps its own VMRUN. I will fix the > wording. ... > > > @@ -3827,7 +3833,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) > > > vcpu->arch.cr2 = svm->vmcb->save.cr2; > > > vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; > > > vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; > > > - vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip; > > > + if (single_step_vmrun && svm->vmcb->control.exit_code == > > > + SVM_EXIT_EXCP_BASE + DB_VECTOR) > > > + single_step_vmrun = false; > > Even if you fix the global flag issue, this can't possibly work if userspace > > changes state, if VMRUN fails and leaves a timebomb, and probably any number of > > other conditions. > > > Are you saying that I need to adjust the RIP in cases where VMRUN fails ? If VMRUN fails, the #DB exit will never occur and single_step_vmrun will be left set. Ditto if a higher priority exit occurs, though I'm not sure that can cause problems in practice. Anyways, my point is that setting a flag that must be consumed on an exact instruction is almost always fragile, there are just too many corner cases that pop up. Can you elaborate more on who/what incorrectly advances RIP? The changelog says "svm_vcpu_run() advances the RIP", but it's not advancing anything it's just grabbing RIP from the VMCB, which IIUC is L2's RIP.