Re: [Bug 215459] VM freezes starting with kernel 5.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2022-01-06 at 18:52 +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215459
> 
> Sean Christopherson (seanjc@xxxxxxxxxx) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |seanjc@xxxxxxxxxx
> 
> --- Comment #4 from Sean Christopherson (seanjc@xxxxxxxxxx) ---
> The fix Maxim is referring to is commit fdba608f15e2 ("KVM: VMX: Wake vCPU when
> delivering posted IRQ even if vCPU == this vCPU").  But the buggy commit was
> introduced back in v5.8, so it's unlikely that's the issue, or at least that
> it's the only issue.  And assuming the VM in question has multiple vCPUs (which
> I'm pretty sure is true based on the config), that bug is unlikely to cause the
> entire VM to freeze; the expected symptom is that a vCPU isn't awakened when it
> should be, and while it's possible multiple vCPUs could get unlucky, taking
> down the entire VM is highly improbable.  That said, it's worth trying that
> fix, I'm just not very optimistic :-)

Actually in my experience in both Linux and Windows, a stuck vCPU derails the whole VM.
That is how I found about the AVIC errata - only one vCPU got stuck and the whole VM froze,
and it was a a windows VM.

On Linux also these days things like RCU and such make everything freeze very fast.

> 
> Assuming this is something different, the biggest relevant changes in v5.15 are
> that the TDP MMU is enabled by default, and that the APIC access page memslot
> is not deleted when APICv is inhibited.

> 
> Can you try disabling the TDP MMU with APICv still enabled?  KVM allows that to
> be toggled without unloading, e.g. "echo N | sudo tee
> /sys/module/kvm/parameters/tdp_mmu", the VM just needs to be started after the
> param is toggled.

This is a very good idea. I keep on forgetting that TDP mmu is now the default.

> 
> Running v5.16 (or v5.16-rc8, as there are no KVM changes expected between rc8
> ad the final release) would also be very helpful.  If we get lucky and the
> issue is resolved in v5.16, then it would be nice to "reverse" bisect to
> understand exactly what fixed the problem.

Or just bisect it if not fixed. It would be very helpful!

> 
> > Assuming I really do have APICv: is there anything I need to change in my XML
> > to really make use of this feature or does it work "out of the box"?
> 
> APICv works out of the box, though lack of IOMMU support does mean that your
> system can't post interrupts from devices, which is usually the biggest
> performance benefit to APICv on Intel.

I haven't measured it formally, but with posted timer interrupts on AMD,
this does quite reduce the number of VM exits, even without any pass-through
devices.

For passthrough devices, also note that without IOMMU support, still,
while the device does send a regular interrupt to the host, then host
handler uses APICv to deliver it to the guest, so assuming that interrupt
is not pinned on one of vCPUs, the VM still doesn't get a VM exit.

I once benchmarked a pass-through nvme device on old Xeon which didn't had
IOMMU posted interrupts, and APICv still made quite a difference.

I so wish Intel would not disable this feature on consumer systems.
But then AVIC has bugs.. Oh well.

Best regards,
	Maxim Levitsky




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux