Hi Marc, On 3/30/21 9:07 PM, Marc Zyngier wrote: > On Tue, 30 Mar 2021 18:13:07 +0100, > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: >> Hi Marc, >> >> Thanks for having a look! >> >> On 3/30/21 10:55 AM, Marc Zyngier wrote: >>> Hi Alex, >>> >>> On Tue, 23 Mar 2021 18:00:57 +0000, >>> Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: >>>> When a VCPU is created, the kvm_vcpu struct is initialized to zero in >>>> kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time >>>> vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is >>>> set to a sensible value in kvm_arm_setup_debug() later in the run loop. The >>>> result is that KVM executes for a short time with MDCR_EL2 set to zero. >>>> >>>> This has several unintended consequences: >>>> >>>> * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM >>>> DDI 0487G.a, page D13-3820. The behavior specified by the architecture >>>> in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a >>>> value less than or equal to PMCR_EL0.N, which means that an unknown >>>> number of counters are now disabled by MDCR_EL2.HPME, which is zero. >>>> >>>> * The host configuration for the other debug features controlled by >>>> MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux >>>> doesn't use the other fields, but that might change in the future. >>>> >>>> Let's avoid both issues by initializing the VCPU's mdcr_el2 field in >>>> kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register >>>> has a consistent value after each vcpu_load(). >>>> >>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> >>> This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write >>> arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a >>> different point. Probably worth a Fixes tag. >> This bug is present in the commit you are mentioning, and from what >> I can tell it's also present in the commit it's fixing (d5a21bcc2995 >> ("KVM: arm64: Move common VHE/non-VHE trap config in separate >> functions")) - vcpu->arch.mdcr_el2 is computed in >> kvm_arm_setup_debug(), which is called after vcpu_load(). My guess >> is that this bug is from VHE support was added (or soon after). > Right. Can you please add a Fixes: tag for the same commit? At least > that'd be consistent. Yes, I'll do that. > >> I can dig further, how far back in time should I aim for? >> >>>> --- >>>> Found by code inspection. Based on v5.12-rc4. >>>> >>>> Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be >>>> 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in >>>> kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct >>>> value, 0xe66 (FEAT_SPE is not implemented by the PE). >>>> >>>> This patch was initially part of the KVM SPE series [1], but those patches >>>> haven't seen much activity, so I thought it would be a good idea to send >>>> this patch separately to draw more attention to it. >>>> >>>> Changes in v2: >>>> * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so >>>> vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails. >>>> * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing >>>> vcpu->guest_debug means. >>>> >>>> [1] https://www.spinics.net/lists/kvm-arm/msg42959.html >>>> >>>> arch/arm64/include/asm/kvm_host.h | 1 + >>>> arch/arm64/kvm/arm.c | 3 +- >>>> arch/arm64/kvm/debug.c | 82 +++++++++++++++++++++---------- >>>> 3 files changed, 59 insertions(+), 27 deletions(-) >>>> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>>> index 3d10e6527f7d..858c2fcfc043 100644 >>>> --- a/arch/arm64/include/asm/kvm_host.h >>>> +++ b/arch/arm64/include/asm/kvm_host.h >>>> @@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} >>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} >>>> >>>> void kvm_arm_init_debug(void); >>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu); >>>> void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); >>>> void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); >>>> void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu); >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c >>>> index 7f06ba76698d..7088d8fe7186 100644 >>>> --- a/arch/arm64/kvm/arm.c >>>> +++ b/arch/arm64/kvm/arm.c >>>> @@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) >>>> >>>> vcpu->arch.has_run_once = true; >>>> >>>> + kvm_arm_vcpu_init_debug(vcpu); >>>> + >>>> if (likely(irqchip_in_kernel(kvm))) { >>>> /* >>>> * Map the VGIC hardware resources before running a vcpu the >>>> @@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) >>>> } >>>> >>>> kvm_arm_setup_debug(vcpu); >>>> - >>> Spurious change? >> Definitely, thank you for spotting it. >> >>>> /************************************************************** >>>> * Enter the guest >>>> */ >>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c >>>> index 7a7e425616b5..3626d03354f6 100644 >>>> --- a/arch/arm64/kvm/debug.c >>>> +++ b/arch/arm64/kvm/debug.c >>>> @@ -68,6 +68,60 @@ void kvm_arm_init_debug(void) >>>> __this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2)); >>>> } >>>> >>>> +/** >>>> + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value >>>> + * >>>> + * @vcpu: the vcpu pointer >>>> + * @host_mdcr: host mdcr_el2 value >>>> + * >>>> + * This ensures we will trap access to: >>>> + * - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR) >>>> + * - Debug ROM Address (MDCR_EL2_TDRA) >>>> + * - OS related registers (MDCR_EL2_TDOSA) >>>> + * - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB) >>>> + */ >>>> +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr) >>>> +{ >>>> + bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY); >>>> + >>>> + /* >>>> + * This also clears MDCR_EL2_E2PB_MASK to disable guest access >>>> + * to the profiling buffer. >>>> + */ >>>> + vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK; >>>> + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | >>>> + MDCR_EL2_TPMS | >>>> + MDCR_EL2_TPMCR | >>>> + MDCR_EL2_TDRA | >>>> + MDCR_EL2_TDOSA); >>>> + >>>> + /* Is the VM being debugged by userspace? */ >>>> + if (vcpu->guest_debug) { >>>> + /* Route all software debug exceptions to EL2 */ >>>> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE; >>>> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) >>>> + trap_debug = true; >>>> + } >>>> + >>>> + /* Trap debug register access */ >>>> + if (trap_debug) >>>> + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; >>>> + >>>> + trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2); >>>> +} >>>> + >>>> +/** >>>> + * kvm_arm_vcpu_init_debug - setup vcpu debug traps >>>> + * >>>> + * @vcpu: the vcpu pointer >>>> + * >>>> + * Set vcpu initial mdcr_el2 value. >>>> + */ >>>> +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu) >>>> +{ >>>> + kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2)); >>> Given that kvm_arm_setup_mdcr_el2() always takes the current host >>> value for mdcr_el2, why not moving the read into it and be done with >>> it? >> kvm_arm_setup_debug() is called with preemption disabled, and it can >> use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with >> preemption enabled, so it must use this_cpu_read(). I wanted to make >> the distinction because kvm_arm_setup_debug() is in the run loop. > I think it would be absolutely fine to make the slow path of > kvm_vcpu_first_run_init() run with preempt disabled. This happens so > rarely that that it isn't worth thinking about it. It looks to me like it's a bit too heavy-handed to run the entire function kvm_vcpu_first_run_init() with preemption disabled just for __this_cpu_read() in kvm_arm_setup_mdcr_el2(). Not because of the performance cost (it's negligible, as it's called exactly once in the VCPU lifetime), but because it's not obvious why it is needed. I tried this: @@ -580,7 +580,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) vcpu->arch.has_run_once = true; - kvm_arm_vcpu_init_debug(vcpu); + preempt_disable(); + kvm_arm_setup_mdcr_el2(vcpu); + preempt_enable(); if (likely(irqchip_in_kernel(kvm))) { /* and it still looks a bit off to me because preemption needs to be disabled because of an implementation detail in kvm_arm_setup_mdcr_el2(), as the function operates on the VCPU struct and preemption can be enabled for that. I was thinking something like this: @@ -119,7 +119,9 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr) */ void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu) { - kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2)); + preempt_disable(); + kvm_arm_setup_mdcr_el2(vcpu); + preempt_enable(); } /** What do you think? Thanks, Alex > > Please give it a lockdep run though! ;-) > >>> Also, do we really need an extra wrapper? >> I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use >> this_cpu_read() for the host's mdcr_el2 value at the cost of a >> preempt disable/enable in the run loop when preemption is >> disabled. If you think that would make the code easier to follow, I >> can certainly do that. > As explained above, I'd rather you keep the __this_cpu_read() and make > kvm_vcpu_first_run_init() preemption safe. > > Thanks, > > M. > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm