Understanding the Linux KVM CPU Scheduler

William Scott <wscottcanada@xxxxxxxxx> · Fri, 18 May 2018 13:22:30 -0400

Greetings!

I'm encountering difficulty understanding the Linux CPU Scheduler
within the context of KVM virtual machines.  Specifically, I'd like to
understand if/how groups of logical cores are allocated to virtual
machines in an oversubscribed environment.

At a high level, my question is "how does the scheduler handle
allocation of logical cores to a VM that is provisioned more cores
than is currently available? E.g., the host has 3 logical cores
available but a VM is provisioned with 8 vCPUs."  I'm predominantly
concerned with the guest operating system not observing synchronous
progress across all vCPUs and potential related errors e.g. a watchdog
timer might expect a response from a sibling vCPU (which was not
allocated a logical core to run on) within a specified time.  I expect
KVM to use the completely fair scheduler (CFS) and a variation of
co-scheduling/gang scheduling, but I've been unable to discern whether
this is true (it was mentioned in a lwn.net article in 2011, but
hasn't been expanded upon since https://lwn.net/Articles/472797/).

I've discovered ESXi approaches this with relaxed co-scheduling
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf
(pg. 7).

I've also discovered a similar email discussion directed at a
different mailing list (which suggested to mail this one),
https://lists.centos.org/pipermail/centos-virt/2010-November/002214.html

For context on my end, I am operating two virtual machine 'stacks' in
an a heavily oversubscribed OpenStack KVM cloud environment.  Each
'stack' consists of two virtual machines.  The first generates network
traffic (a 'traffic generator') and sends this traffic through two
separate interfaces to corresponding networks.  The second virtual
machine acts as a bridge for these networks.  A rudimentary diagram is
shown below.

.----[traffic generator]----.
|                                         |
'--------[VM bridge]--------'

Interestingly;
*  When the VM bridge is provisioned with 2 vCPUs, the traffic
generator reports ~ 10% packet loss
*  When the VM bridge is provisioned with 4 vCPUs, the traffic
generator reports ~ 40% packet loss

I suspect the packet loss originates from the virtual interface buffer
overflow.  To the best of my understanding, although the completely
fair scheduler would schedule the VMs for equivalent durations, the
2vCPU VM will be scheduled more frequently (for shorter periods)
because it is easier for the scheduler to find and allocate 2vCPUs
than 4vCPUs.  This will allow the buffers to be emptied more regularly
which results in less packet loss.  However, in order to
prove/disprove this theory, I'd need to know how the completely fair
scheduler handles co-scheduling in the context of KVM virtual
machines.

Thank you kindly,
William