Re: [PATCH v2 02/12] KVM: arm64: nv: Sync nested timer state with FEAT_NV2

Wei-Lin Chang <r09922117@xxxxxxxxxxxxxxx> · Mon, 6 Jan 2025 10:19:10 +0800

Hi Marc and other KVM ARM developers,

I am trying to learn about NV and I have a few questions wile reading
the code and comments:

On Tue, Dec 17, 2024 at 02:23:10PM +0000, Marc Zyngier wrote:
> Emulating the timers with FEAT_NV2 is a bit odd, as the timers
> can be reconfigured behind our back without the hypervisor even
> noticing. In the VHE case, that's an actual regression in the
> architecture...

I'm curious why you implied NV2 isn't a regression in the nVHE case? In
my understanding without NV2 and ECV, an nVHE guest hypervisor would
directly program CNT{P, V}_*_EL0 registers without traps, and with NV2,
those accesses would be turned into memory accesses to the VNCR page,
delaying the set up of the timer until the next exit.
So NV2 also makes it worse for an nVHE guest hypervisor. What have I
missed here?

> 
> Co-developed-by: Christoffer Dall <christoffer.dall@xxxxxxx>
> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxx>
> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> ---
>  arch/arm64/kvm/arch_timer.c  | 44 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c         |  3 +++
>  include/kvm/arm_arch_timer.h |  1 +
>  3 files changed, 48 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 1215df5904185..ee5f732fbbece 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -905,6 +905,50 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  		kvm_timer_blocking(vcpu);
>  }
>  
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * When NV2 is on, guest hypervisors have their EL1 timer register
> +	 * accesses redirected to the VNCR page. Any guest action taken on
> +	 * the timer is postponed until the next exit, leading to a very
> +	 * poor quality of emulation.
> +	 */
> +	if (!is_hyp_ctxt(vcpu))
> +		return;
> +
> +	if (!vcpu_el2_e2h_is_set(vcpu)) {
> +		/*
> +		 * A non-VHE guest hypervisor doesn't have any direct access
> +		 * to its timers: the EL2 registers trap (and the HW is
> +		 * fully emulated), while the EL0 registers access memory
> +		 * despite the access being notionally direct. Boo.

Reading this part of the comment I understand it as for an nVHE guest
hypervisor, the timer map would look like:

map->emul_vtimer == vcpu_hvtimer(vcpu)  /* EL2 HW is fully emulated */
map->emul_ptimer == vcpu_hptimer(vcpu)  /* EL2 HW is fully emulated */
map->direct_vtimer == vcpu_vtimer(vcpu) /* EL0 notionally direct */
map->direct_ptimer == vcpu_ptimer(vcpu) /* EL0 notionally direct */

But looking at get_timer_map() I see we enter the first nested if
statement, regardless of whether the guest hypervisor is VHE or nVHE,
since it checks like the following:

if (vcpu_has_nv(vcpu)) {
	if (is_hyp_ctxt(vcpu)) {
		map->direct_vtimer == vcpu_hvtimer(vcpu);
		map->direct_ptimer == vcpu_hptimer(vcpu);
		map->emul_vtimer == vcpu_vtimer(vcpu);
		map->emul_ptimer == vcpu_ptimer(vcpu);
	} else {
		[...]
	}
} else if (...) {
	[...]
} else {
	[...]
}

Which contradicts what the above comment says.
What did I misunderstand?

Thank you so much, any guidance is much appreciated.

Wei-Lin Chang

> +		 *
> +		 * We update the hardware timer registers with the
> +		 * latest value written by the guest to the VNCR page
> +		 * and let the hardware take care of the rest.
> +		 */
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
> +	} else {
> +		/*
> +		 * For a VHE guest hypervisor, the EL2 state is directly
> +		 * stored in the host EL1 timers, while the emulated EL0
> +		 * state is stored in the VNCR page. The latter could have
> +		 * been updated behind our back, and we must reset the
> +		 * emulation of the timers.
> +		 */
> +		struct timer_map map;
> +		get_timer_map(vcpu, &map);
> +
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
> +		soft_timer_cancel(&map.emul_ptimer->hrtimer);
> +		timer_emulate(map.emul_vtimer);
> +		timer_emulate(map.emul_ptimer);
> +	}
> +}
> +
>  /*
>   * With a userspace irqchip we have to check if the guest de-asserted the
>   * timer and if so, unmask the timer irq signal on the host interrupt
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a102c3aebdbc4..fa3089822f9f3 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1228,6 +1228,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  			kvm_timer_sync_user(vcpu);
>  
> +		if (vcpu_has_nv(vcpu))
> +			kvm_timer_sync_nested(vcpu);
> +
>  		kvm_arch_vcpu_ctxsync_fp(vcpu);
>  
>  		/*
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index fd650a8789b91..6e3f6b7ff2b22 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -98,6 +98,7 @@ int __init kvm_timer_hyp_init(bool has_gic);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> -- 
> 2.39.2
>