Re: Understanding the Linux KVM CPU Scheduler

David Hildenbrand <david@xxxxxxxxxx> · Tue, 22 May 2018 15:38:14 +0200

On 18.05.2018 19:22, William Scott wrote:
> Greetings!
> 
> I'm encountering difficulty understanding the Linux CPU Scheduler
> within the context of KVM virtual machines.  Specifically, I'd like to
> understand if/how groups of logical cores are allocated to virtual
> machines in an oversubscribed environment.

Hi,

I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated
that way by the Linux scheduler. So, whatever scheduler policy you
configure for these threads, it will be used in the kernel.

> 
> At a high level, my question is "how does the scheduler handle
> allocation of logical cores to a VM that is provisioned more cores
> than is currently available? E.g., the host has 3 logical cores
> available but a VM is provisioned with 8 vCPUs."  I'm predominantly
> concerned with the guest operating system not observing synchronous
> progress across all vCPUs and potential related errors e.g. a watchdog
> timer might expect a response from a sibling vCPU (which was not

As you're pushing 8 virtual cpus on 3 logical cpus, there will never be
s such a thing as "synchronous progress". This is even true when having
2 virtual cpus on 3 logical cpus. At least not in default scenarios.
There will be other processes/threads to schedule.

Now, let's assume watchdogs are usually ~30seconds. So if you're
hypervisor is heavily overloaded, it can of course happen that a
watchdog strikes, or rather some RCU deadlock prevention in your guest
operating system will trigger before that.

We apply some techniques to optimize some scenarios. E.g. VCPU yielding,
paravirtualized spinlocks etc, to avoid a guest VCPU to waste CPU cycles
waiting for conditions that require other VCPUs to run first.

> allocated a logical core to run on) within a specified time.  I expect
> KVM to use the completely fair scheduler (CFS) and a variation of
> co-scheduling/gang scheduling, but I've been unable to discern whether
> this is true (it was mentioned in a lwn.net article in 2011, but
> hasn't been expanded upon since https://lwn.net/Articles/472797/).
> 
> I've discovered ESXi approaches this with relaxed co-scheduling
> https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf
> (pg. 7).
> 
> I've also discovered a similar email discussion directed at a
> different mailing list (which suggested to mail this one),
> https://lists.centos.org/pipermail/centos-virt/2010-November/002214.html
> 
> For context on my end, I am operating two virtual machine 'stacks' in
> an a heavily oversubscribed OpenStack KVM cloud environment.  Each
> 'stack' consists of two virtual machines.  The first generates network
> traffic (a 'traffic generator') and sends this traffic through two
> separate interfaces to corresponding networks.  The second virtual
> machine acts as a bridge for these networks.  A rudimentary diagram is
> shown below.
> 
> .----[traffic generator]----.
> |                                         |
> '--------[VM bridge]--------'
> 
> Interestingly;
> *  When the VM bridge is provisioned with 2 vCPUs, the traffic
> generator reports ~ 10% packet loss
> *  When the VM bridge is provisioned with 4 vCPUs, the traffic
> generator reports ~ 40% packet loss
> 
> I suspect the packet loss originates from the virtual interface buffer
> overflow.  To the best of my understanding, although the completely
> fair scheduler would schedule the VMs for equivalent durations, the
> 2vCPU VM will be scheduled more frequently (for shorter periods)
> because it is easier for the scheduler to find and allocate 2vCPUs
> than 4vCPUs.  This will allow the buffers to be emptied more regularly
> which results in less packet loss.  However, in order to
> prove/disprove this theory, I'd need to know how the completely fair
> scheduler handles co-scheduling in the context of KVM virtual
> machines.

I hope somebody else with more insight can answer that. But in general,
to avoid package drops you might want to look into (v)CPU pinning / KVM
RT. But this will require to have *at least* the number of logical cores
as you have virtual CPUs.

> 
> Thank you kindly,
> William
> 

-- 

Thanks,

David / dhildenb