On Thu, Sep 02, 2021 at 10:24:08AM +0200, Martin Kletzander wrote: > On Thu, Sep 02, 2021 at 10:44:06AM +0800, Jiatong Shen wrote: > > Hello, > > > > I am trying to understand why qemu vm CPU threads uses isolated cpus. > > > > I have a host which isolates some cpus using isolcpu > > like isolcpus=1,2,3,4,5,7,8,9,10,11. unfortunately, vcpupin does not mask > > out these cpus (vcpupin is still something like ffffffff). > > That is because there are use cases which need this. They isolate cpus > to be used by VM only (sometimes even moving kernel workload from these > cpus) and automatically removing isolcpus from the set would break this > valid behaviour. > > > When I log in to the system, seems qemu cpu thread only runs on these > > isolcpus. I do not quite understand this behavior, because I think by using > > isolcpu, kernel schedule will exclude these cpus and thus vcpu thread > > shouldn't use these cores unless taskset explicitly got called.. So my > > question is how does cpu thread got scheduled on isolated cpus? > > > > libvirt sets the affinity for VMs because libvirt itself might not be > running on all cores and qemu being a child process would otherwise > inherit the affinity. We even have this in the documentation and if you > want to limit the cpus it needs to be defined in the XML. > > It begs the question whether we should somehow coordinate with the > kernel based on isolcpus, but I am not sure under what circumstances we > should do that and what is the proper way to do that. I would suggest > you file an issue to discuss this further unless someone comes up with a > clear decision here. Well if someone is using isolcpus, it is because they want to have some pretty strict control over what runs on what CPUs. The notion of what a good default placement would be, is then is quite ill-defined. Do you want to avoid the isolcpus CPUs mask, because it is being used for non-VM tasks, or do you want to use the isolcpus CPUs becuase it is intended for VM tasks. Further more, if we paid any attention to isolcpus mask, then the VM XML configuration would no longer result in a reproducable deployment - semantics would vary based on the invisible isolcpus setting. Given all that, if someone is using isolcpus, then I'd really expect that they set explicit affinity for the VMs too. Note this does not have to be done at the libvirt level at all. On systemd hosts all VMs will be placed in /machine.slice, so even if the QEMU processes have an all-1s affinity mask, the CPU affinity on /machine.slice will take priority. IOW, if setting isolcpus, you should also always set /machine.slice CPU affinity. Which leads into the final point - the need for isolcpus is widely mis-understood. The only scenario isolcpus is generally required is for hard real-time workloads, where you absolutely must stop all kernel threads running on those CPUs. In any non-real-time scenario, it is sufficient to "isolate" / "reserve" CPUs using CPU affinity in cgroups alone. For systemd this can be done globally using CPUAffinity in /etc/systemd/system.conf, to restrict most OS services to some house keeping CPUs, and then using /machine.slice to grant access to other CPUs for VMs. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|