Re: [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 10 Jan 2023 14:05:20 +0000

Hi Ganapatrao,

On Tue, 10 Jan 2023 12:17:20 +0000,
Ganapatrao Kulkarni <gankulkarni@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> 
> Hi Marc,
> 
> On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> > This series contains 3 fixes which were found while testing
> > ARM64 Nested Virtualization patch series.
> > 
> > First patch avoids the restart of hrtimer when timer interrupt is
> > fired/forwarded to Guest-Hypervisor.
> > 
> > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
> > 
> > Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> > configured with 64K pagesize where as Host Hypervisor with 4K.
> > 
> > These patches are rebased on Nested Virtualization V6 patchset[1].
> 
> If I boot a Guest Hypervisor with more cores and then booting of a
> NestedVM with equal number of cores or booting multiple
> NestedVMs(simultaneously) with lower number of cores is resulting in
> very slow booting and some time RCU soft-lockup of a NestedVM. This I
> have debugged and turned out to be due to many SGI are getting
> asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM
> code prepares NestedVM for WFI wakeup/return.
> 
> When Guest Hypervisor prepares NestedVM while returning/resuming from
> WFI, it is loading guest-context,  vGIC and timer contexts etc.
> The function gic_poke_irq (called from irq_set_irqchip_state with
> spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's
> KVM code resulting in mem-abort trap to Host Hypervisor. Host
> Hypervisor as part of handling the guest mem abort, function
> io_mem_abort is called  in turn vgic_mmio_write_sactive, which
> prepares every vCPU of Guest Hypervisor by calling SGI. The number of
> SGI/IPI calls goes exponentially high when more and more cores are
> used to boot Guest Hypervisor.

This really isn't surprising. NV combined with oversubscribing is
bound to be absolutely terrible. The write to GICD_ISACTIVER is
only symptomatic of the interrupt amplification problem that already
exists without NV (any IPI in a guest is likely to result in at least
one IPI in the host).

Short of having direct injection of interrupts for *all* interrupts,
you end-up with this sort of emulation that relies on being able to
synchronise all CPUs. Is it bad? Yes. Very.

> 
> Code trace:
> At Guest-hypervisor:
> kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active->
> irq_set_irqchip_state->gic_poke_irq
> 
> At Host-Hypervisor: io_mem_abort->
> kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write->
> vgic_mmio_write_sactive->vgic_access_active_prepare->
> kvm_kick_many_cpus->smp_call_function_many
> 
> I am currently working around this with "nohlt" kernel param to
> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
> slowness of booting of NestedVM with more cores?

At the moment, I'm focussing on correctness rather than performance.

Maybe we can restrict the conditions in which we perform this
synchronisation, but that's pretty low on my radar at the moment. Once
things are in a state that can be merged and that works correctly, we
can look into it.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.