Hi Marc, Thanks for having a look! On 3/30/21 10:55 AM, Marc Zyngier wrote: > Hi Alex, > > On Tue, 23 Mar 2021 18:00:57 +0000, > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: >> When a VCPU is created, the kvm_vcpu struct is initialized to zero in >> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time >> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is >> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The >> result is that KVM executes for a short time with MDCR_EL2 set to zero. >> >> This has several unintended consequences: >> >> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM >> DDI 0487G.a, page D13-3820. The behavior specified by the architecture >> in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a >> value less than or equal to PMCR_EL0.N, which means that an unknown >> number of counters are now disabled by MDCR_EL2.HPME, which is zero. >> >> * The host configuration for the other debug features controlled by >> MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux >> doesn't use the other fields, but that might change in the future. >> >> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in >> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register >> has a consistent value after each vcpu_load(). >> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> > This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write > arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a > different point. Probably worth a Fixes tag. This bug is present in the commit you are mentioning, and from what I can tell it's also present in the commit it's fixing (d5a21bcc2995 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions")) - vcpu->arch.mdcr_el2 is computed in kvm_arm_setup_debug(), which is called after vcpu_load(). My guess is that this bug is from VHE support was added (or soon after). I can dig further, how far back in time should I aim for? > >> --- >> Found by code inspection. Based on v5.12-rc4. >> >> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be >> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in >> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct >> value, 0xe66 (FEAT_SPE is not implemented by the PE). >> >> This patch was initially part of the KVM SPE series [1], but those patches >> haven't seen much activity, so I thought it would be a good idea to send >> this patch separately to draw more attention to it. >> >> Changes in v2: >> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so >> vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails. >> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing >> vcpu->guest_debug means. >> >> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html >> >> arch/arm64/include/asm/kvm_host.h | 1 + >> arch/arm64/kvm/arm.c | 3 +- >> arch/arm64/kvm/debug.c | 82 +++++++++++++++++++++---------- >> 3 files changed, 59 insertions(+), 27 deletions(-) >> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >> index 3d10e6527f7d..858c2fcfc043 100644 >> --- a/arch/arm64/include/asm/kvm_host.h >> +++ b/arch/arm64/include/asm/kvm_host.h >> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} >> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} >> >> void kvm_arm_init_debug(void); >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu); >> void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); >> void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); >> void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu); >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c >> index 7f06ba76698d..7088d8fe7186 100644 >> --- a/arch/arm64/kvm/arm.c >> +++ b/arch/arm64/kvm/arm.c >> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) >> >> vcpu->arch.has_run_once = true; >> >> + kvm_arm_vcpu_init_debug(vcpu); >> + >> if (likely(irqchip_in_kernel(kvm))) { >> /* >> * Map the VGIC hardware resources before running a vcpu the >> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) >> } >> >> kvm_arm_setup_debug(vcpu); >> - > Spurious change? Definitely, thank you for spotting it. > >> /************************************************************** >> * Enter the guest >> */ >> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c >> index 7a7e425616b5..3626d03354f6 100644 >> --- a/arch/arm64/kvm/debug.c >> +++ b/arch/arm64/kvm/debug.c >> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void) >> __this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2)); >> } >> >> +/** >> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value >> + * >> + * @vcpu: the vcpu pointer >> + * @host_mdcr: host mdcr_el2 value >> + * >> + * This ensures we will trap access to: >> + * - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR) >> + * - Debug ROM Address (MDCR_EL2_TDRA) >> + * - OS related registers (MDCR_EL2_TDOSA) >> + * - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB) >> + */ >> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr) >> +{ >> + bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY); >> + >> + /* >> + * This also clears MDCR_EL2_E2PB_MASK to disable guest access >> + * to the profiling buffer. >> + */ >> + vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK; >> + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | >> + MDCR_EL2_TPMS | >> + MDCR_EL2_TPMCR | >> + MDCR_EL2_TDRA | >> + MDCR_EL2_TDOSA); >> + >> + /* Is the VM being debugged by userspace? */ >> + if (vcpu->guest_debug) { >> + /* Route all software debug exceptions to EL2 */ >> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE; >> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) >> + trap_debug = true; >> + } >> + >> + /* Trap debug register access */ >> + if (trap_debug) >> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; >> + >> + trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2); >> +} >> + >> +/** >> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps >> + * >> + * @vcpu: the vcpu pointer >> + * >> + * Set vcpu initial mdcr_el2 value. >> + */ >> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu) >> +{ >> + kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2)); > Given that kvm_arm_setup_mdcr_el2() always takes the current host > value for mdcr_el2, why not moving the read into it and be done with > it? kvm_arm_setup_debug() is called with preemption disabled, and it can use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with preemption enabled, so it must use this_cpu_read(). I wanted to make the distinction because kvm_arm_setup_debug() is in the run loop. > > Also, do we really need an extra wrapper? I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use this_cpu_read() for the host's mdcr_el2 value at the cost of a preempt disable/enable in the run loop when preemption is disabled. If you think that would make the code easier to follow, I can certainly do that. > >> +} >> + >> /** >> * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state >> */ >> @@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) >> * @vcpu: the vcpu pointer >> * >> * This is called before each entry into the hypervisor to setup any >> - * debug related registers. Currently this just ensures we will trap >> - * access to: >> - * - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR) >> - * - Debug ROM Address (MDCR_EL2_TDRA) >> - * - OS related registers (MDCR_EL2_TDOSA) >> - * - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB) >> + * debug related registers. >> * >> * Additionally, KVM only traps guest accesses to the debug registers if >> * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY >> @@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) >> >> void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) >> { >> - bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY); >> unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2; >> >> trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug); >> >> - /* >> - * This also clears MDCR_EL2_E2PB_MASK to disable guest access >> - * to the profiling buffer. >> - */ >> - vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK; >> - vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | >> - MDCR_EL2_TPMS | >> - MDCR_EL2_TPMCR | >> - MDCR_EL2_TDRA | >> - MDCR_EL2_TDOSA); >> + kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2)); >> >> /* Is Guest debugging in effect? */ >> if (vcpu->guest_debug) { >> - /* Route all software debug exceptions to EL2 */ >> - vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE; >> - >> /* Save guest debug state */ >> save_guest_debug_regs(vcpu); >> >> @@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) >> >> vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state; >> vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY; >> - trap_debug = true; > There is something that slightly worries me here: there is now a > disconnect between flagging debug as dirty and setting the > trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and > set the trap bits *before* setting the dirty bit itself. > > Here, I believe you end up with guest/host confusion of breakpoints, > which isn't great. Or did I miss something? I'm sorry, but I don't understand what you mean. This is my understanding of what is happening. Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is only used when computing mdcr_el2. With this patch, trap_debug is set to true if vcpu->guest_debug & KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug(). The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before. The guest never runs with the value computed in kvm_vcpu_first_run_init() unless it's identical with the value recomputed in kvm_arm_setup_debug(). The only difference I see is that mdcr_el2 is computed at the start of kvm_arm_setup_debug(). I get the feeling I'm also missing something. Thanks, Alex > >> >> trace_kvm_arm_set_regset("BKPTS", get_num_brps(), >> &vcpu->arch.debug_ptr->dbg_bcr[0], >> @@ -189,10 +224,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) >> BUG_ON(!vcpu->guest_debug && >> vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state); >> >> - /* Trap debug register access */ >> - if (trap_debug) >> - vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; >> - >> /* If KDE or MDE are set, perform a full save/restore cycle. */ >> if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE)) >> vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY; >> @@ -201,7 +232,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) >> if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2) >> write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); >> >> - trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2); >> trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1)); >> } > Thanks, > > M. > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm