On Sun, Jun 30, 2024 at 08:49:48PM +0200, Mikulas Patocka wrote: > > > On Sun, 30 Jun 2024, Tejun Heo wrote: > > > Hello, > > > > On Sat, Jun 29, 2024 at 08:15:56PM +0200, Mikulas Patocka wrote: > > > > > With 6.5, we get 3600MiB/s; with 6.6 we get 1400MiB/s. > > > > > > The reason is that virt-manager by default sets up a topology where we > > > have 16 sockets, 1 core per socket, 1 thread per core. And that workqueue > > > patch avoids moving work items across sockets, so it processes all > > > encryption work only on one virtual CPU. > > > > The performance degradation may be fixed with "echo 'system' > > > >/sys/module/workqueue/parameters/default_affinity_scope" - but it is > > > regression anyway, as many users don't know about this option. > > > > > > How should we fix it? There are several options: > > > 1. revert back to 'numa' affinity > > > 2. revert to 'numa' affinity only if we are in a virtual machine > > > 3. hack dm-crypt to set the 'numa' affinity for the affected workqueues > > > 4. any other solution? > > > > Do you happen to know why libvirt is doing that? There are many other > > implications to configuring the system that way and I don't think we want to > > design kernel behaviors to suit topology information fed to VMs which can be > > arbitrary. > > > > Thanks. > > I don't know why. I added users@xxxxxxxxxxxxxxxxx to the CC. > > How should libvirt properly advertise "we have 16 threads that are > dynamically scheduled by the host kernel, so the latencies between them > are changing and unpredictable"? NB, libvirt is just control plane, the actual virtual hardware exposed is implemented across QEMU and the KVM kernel mod. Guest CPU topology and/or NUMA cost information is the responsibility of QEMU. When QEMU's virtual CPUs are floating freely across host CPUs there's no perfect answer. The host admin needs to make a tradeoff in their configuration They can optimize for density, by allowing guest CPUs to float freely and allow CPU overcommit against host CPUs, and the guest CPU topology is essentially a lie. They can optimize for predictable performance, by strictly pinning guest CPUs 1:1 to host CPUs, and minimize CPU overcommit, and have the guest CPU topology 1:1 match the host CPU topology. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|