[Bug 218259] High latency in KVM guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=218259

--- Comment #6 from Joern Heissler (kernelbugs2012@xxxxxxxxxxxxxxxxx) ---
(In reply to Sean Christopherson from comment #5)

> This is likely/hopefully the same thing Yan encountered[1].  If you are able
> to
> test patches, the proposed fix[2] applies cleanly on v6.6 (note, I need to
> post a
> refreshed version of the series regardless), any feedback you can provide
> would
> be much appreciated.
> 
> [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@xxxxxxxxxxxxxxxxxxxxxxxxx
> [2] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@xxxxxxxxxx

I admit that I don't understand most of what's written in the those threads.
I applied the two patches from [2] (excluding [3]) to v6.6 and it appears to
solve the problem.

However I haven't measured how/if any of the changes/flags affect performance
or if any other problems are caused. After about 1 hour uptime it appears to be
okay.

[3] https://lore.kernel.org/all/ZPtVF5KKxLhMj58n@xxxxxxxxxx/


> KVM changes aside, I highly recommend evaluating whether or not NUMA
> autobalancing is a net positive for your environment.  The interactions
> between
> autobalancing and KVM are often less than stellar, and disabling
> autobalancing
> is sometimes a completely legitimate option/solution.

I'll have to evaluate multiple options for my production environment.
Patching+Building the kernel myself would only be a last resort. And it will
probably take a while until Debian ships a patch for the issue. So maybe
disable the NUMA balancing, or perhaps try to pin a VM's memory+cpu to a single
NUMA node.

> > 3. tdp_mmu was "Y", disabling it seems to make no difference.
> 
> Hrm, that's odd.  The commit blamed by bisection was purely a TDP MMU change.
> Did you relaunch VMs after disabling the module params?  While the module
> param
> is writable, it's effectively snapshotted by each VM during creation, i.e.
> toggling
> it won't affect running VMs.

It's quite possible that I did not restart the VM afterwards. I tried again,
this time paying attention. Setting it to "N" *does* seem to eliminate the
issue.


> > The newer one prints "pci_bus 0000:7f: Unknown NUMA node; performance will
> be
> > reduced" (same with ff again). The older ones don't.
> 
> That was a new message added by commit ad5086108b9f ("PCI: Warn if no host
> bridge
> NUMA node info"), which was first released in v5.5.

Seems I looked on systems running older (< v5.5) kernels. On the ones with
v5.10 the message is printed too.


Thanks a lot so far, I believe I've now got enough options to consider for my
production environment.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux