On Mon, Aug 09, 2021, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > > The commit efdab992813fb ("KVM: x86: fix escape of guest dr6 to the host") > fixed a bug. It did a great job and reset dr6 unconditionally when the > vcpu being scheduled out since the linux kernel only activates breakpoints > in rescheduling (perf events). > > But writing to debug registers is slow, and it can be shown in perf results > sometimes even neither the host nor the guest activate breakpoints. > > It'd be better to reset it conditionally and this patch moves the code of > reseting dr6 to the path of VM-exit where we can check the related > conditions. And the comment is also updated. > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > --- > And even in the future, the linux kernel might activate breakpoint > in interrupts (rather than in rescheduling only), the host would > not be confused by the stale dr6 after this patch. The possible future > author who would implement it wouldn't need to care about how the kvm > mananges debug registers since it sticks to the host's expectations. Eh, I don't think this is a valid argument. KGBD already manipulates breakpoints in NMI context, activating breakpoints from interrupt context would absolutely require a full audit of the kernel. > Another solution is changing breakpoint.c and make the linux kernel > always reset dr6 in activating breakpoints. So that dr6 is allowed > to be stale when host breakpoints is not enabled and we don't need > to reset dr6 in this case. But it would be no harm to reset it even in > both places and killing stale states is good in programing. Hmm, other than further penalizing guests that enable debug breakpoints. What about adding a arch.switch_db_regs flag to note that DR6 is loaded with a guest value and keeping the reset code in kvm_arch_vcpu_put()? > arch/x86/kvm/x86.c | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4116567f3d44..39b5dad2dd19 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4310,12 +4310,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > > static_call(kvm_x86_vcpu_put)(vcpu); > vcpu->arch.last_host_tsc = rdtsc(); > - /* > - * If userspace has set any breakpoints or watchpoints, dr6 is restored > - * on every vmexit, but if not, we might have a stale dr6 from the > - * guest. do_debug expects dr6 to be cleared after it runs, do the same. > - */ > - set_debugreg(0, 6); > } > > static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, > @@ -9375,6 +9369,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > fastpath_t exit_fastpath; > > bool req_immediate_exit = false; > + bool reset_dr6 = false; > > /* Forbid vmenter if vcpu dirty ring is soft-full */ > if (unlikely(vcpu->kvm->dirty_ring_size && > @@ -9601,6 +9596,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > set_debugreg(vcpu->arch.eff_db[3], 3); > set_debugreg(vcpu->arch.dr6, 6); Not directly related to this patch, but why does KVM_DEBUGREG_RELOAD exist? Commit ae561edeb421 ("KVM: x86: DR0-DR3 are not clear on reset") added it to ensure DR0-3 are fresh when they're modified through non-standard paths, but I don't see any reason why the new values _must_ be loaded into hardware. eff_db needs to be updated, but I don't see why hardware DRs need to be updated unless hardware breakpoints are active or DR exiting is disabled, and in those cases updating hardware is handled by KVM_DEBUGREG_WONT_EXIT and KVM_DEBUGREG_BP_ENABLED. Am I missing something? > vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD; > + reset_dr6 = true; > } else if (unlikely(hw_breakpoint_active())) { > set_debugreg(0, 7); > } > @@ -9631,17 +9627,34 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > kvm_update_dr0123(vcpu); > kvm_update_dr7(vcpu); > vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD; > + reset_dr6 = true; > } > > /* > * If the guest has used debug registers, at least dr7 > * will be disabled while returning to the host. > + * > + * If we have active breakpoints in the host, restore the old state. > + * > * If we don't have active breakpoints in the host, we don't > - * care about the messed up debug address registers. But if > - * we have some of them active, restore the old state. > + * care about the messed up debug address registers but dr6 > + * which is expected to be cleared normally. Otherwise we might > + * leak a stale dr6 from the guest and confuse the host since > + * neither the host reset dr6 on activating next breakpoints nor > + * the hardware clear it on dilivering #DB. The Intel SDM says: > + * > + * Certain debug exceptions may clear bits 0-3. The remaining > + * contents of the DR6 register are never cleared by the > + * processor. To avoid confusion in identifying debug > + * exceptions, debug handlers should clear the register before > + * returning to the interrupted task. > + * > + * Keep it simple and live up to expectations: clear DR6 immediately. > */ > if (hw_breakpoint_active()) > hw_breakpoint_restore(); > + else if (unlikely(reset_dr6)) > + set_debugreg(DR6_RESERVED, 6); > > vcpu->arch.last_vmentry_cpu = vcpu->cpu; > vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > -- > 2.19.1.6.gb485710b >