On Wed, May 26, 2021, Peter Zijlstra wrote: > On Wed, May 26, 2021 at 10:37:26PM +0900, Masanori Misono wrote: > > Hi, > > > > I observed performance degradation when running some parallel programs on a > > VM that has (1) KVM_FEATURE_PV_UNHALT, (2) KVM_FEATURE_STEAL_TIME, and (3) > > multi-core architecture. The benchmark results are shown at the bottom. An > > example of libvirt XML for creating such VM is > > > > ``` > > [...] > > <vcpu placement='static'>8</vcpu> > > <cpu mode='host-model'> > > <topology sockets='1' cores='8' threads='1'/> > > </cpu> > > <qemu:commandline> > > <qemu:arg value='-cpu'/> > > <qemu:arg value='host,l3-cache=on,+kvm-pv-unhalt,+kvm-steal-time'/> > > </qemu:commandline> > > [...] > > ``` > > > > I investigate the cause and found that the problem occurs in the following > > ways: > > > > - vCPU1 schedules thread A, and vCPU2 schedules thread B. vCPU1 and vCPU2 > > share LLC. > > - Thread A tries to acquire a lock but fails, resulting in a sleep state > > (via futex.) > > - vCPU1 becomes idle because there are no runnable threads and does HLT, > > which leads to HLT VMEXIT (if idle=halt, and KVM doesn't disable HLT > > VMEXIT using KVM_CAP_X86_DISABLE_EXITS). > > - KVM sets vCPU1's st->preempted as 1 in kvm_steal_time_set_preempted(). > > - Thread C wakes on vCPU2. vCPU2 tries to do load balancing in > > select_idle_core(). Although vCPU1 is idle, vCPU1 is not a candidate for > > load balancing because is_vcpu_preempted(vCPU1) is true, hence > > available_idle_cpu(vPCU1) is false. > > - As a result, both thread B and thread C stay in the vCPU2's runqueue, and > > vCPU1 is not utilized. If a patch ever gets merged, please put this analysis (or at least a summary of the problem) in the changelog. From the patch itself, I thought "and the vCPU becomes a candidate for CFS load balancing" was referring to CFS in the host, which was obviously confusing. > > The patch changes kvm_arch_cpu_put() so that it does not set st->preempted > > as 1 when a vCPU does HLT VMEXIT. As a result, is_vcpu_preempted(vCPU) > > becomes 0, and the vCPU becomes a candidate for CFS load balancing. > > I'm conficted on this; the vcpu stops running, the pcpu can go do > anything, it might start the next task. There is no saying how quickly > the vcpu task can return to running. Ya, the vCPU _is_ preempted after all. > I'm guessing your setup doesn't actually overload the system; and when > it doesn't have the vcpu thread to run, the pcpu actually goes idle too. > But for those 1:1 cases we already have knobs to disable much of this > IIRC. > > So I'm tempted to say things are working as expected and you're just not > configured right. That does seem to be the case. > > I created a VM with 48 vCPU, and each vCPU is pinned to the corresponding pCPU. If vCPUs are pinned and you want to eke out performance, then I think the correct answer is to ensure nothing else can run on those pCPUs, and/or configure KVM to not intercept HLT.