Re: Understanding the Linux KVM CPU Scheduler

William Scott <wscottcanada@xxxxxxxxx> · Tue, 22 May 2018 14:25:26 -0400

> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated
> that way by the Linux scheduler. So, whatever scheduler policy you
> configure for these threads, it will be used in the kernel.

Thank you for answering David, this confirms my current understanding.
KVM/QEMU sources the underlying CPU scheduler to avoid re-inventing
the wheel.  Whatever the base OS uses, KVM/QEMU will use as well.  I'm
interested in whether the underlying CPU scheduler (in this case CFS,
but it could be any scheduler as you've stated) attempts co-scheduling
or similar mechanism for virtual machines.  It would make sense that
there would be, as operating systems tend to assume simultaneous
access to cores, but so far I haven't found anything to support this.

> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be
> s such a thing as "synchronous progress". This is even true when having
> 2 virtual cpus on 3 logical cpus. At least not in default scenarios.
> There will be other processes/threads to schedule.

I should have phrased this better.  In this example, the host has
access to at least 24 cores but only has 3 cores available to the
guest VM at this 'instance' in time (as the others have been allocated
to other VMs).  My apologies for not adequately conveying my intent.

> I hope somebody else with more insight can answer that. But in general,
> to avoid package drops you might want to look into (v)CPU pinning / KVM
> RT. But this will require to have *at least* the number of logical cores
> as you have virtual CPUs.

You're exactly right, and we do this in our latency sensitive
environments for a performance boost (in addition to cpu isolation,
PCI/SRIOV passthrough, disabling c/p states, and a few other tweaks).
However this approach isn't possible with oversubscription, the
primary draw to virtualizing the environment.

On Tue, May 22, 2018 at 9:38 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> On 18.05.2018 19:22, William Scott wrote:
>> Greetings!
>>
>> I'm encountering difficulty understanding the Linux CPU Scheduler
>> within the context of KVM virtual machines.  Specifically, I'd like to
>> understand if/how groups of logical cores are allocated to virtual
>> machines in an oversubscribed environment.
>
> Hi,
>
> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated
> that way by the Linux scheduler. So, whatever scheduler policy you
> configure for these threads, it will be used in the kernel.
>
>>
>> At a high level, my question is "how does the scheduler handle
>> allocation of logical cores to a VM that is provisioned more cores
>> than is currently available? E.g., the host has 3 logical cores
>> available but a VM is provisioned with 8 vCPUs."  I'm predominantly
>> concerned with the guest operating system not observing synchronous
>> progress across all vCPUs and potential related errors e.g. a watchdog
>> timer might expect a response from a sibling vCPU (which was not
>
> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be
> s such a thing as "synchronous progress". This is even true when having
> 2 virtual cpus on 3 logical cpus. At least not in default scenarios.
> There will be other processes/threads to schedule.
>
> Now, let's assume watchdogs are usually ~30seconds. So if you're
> hypervisor is heavily overloaded, it can of course happen that a
> watchdog strikes, or rather some RCU deadlock prevention in your guest
> operating system will trigger before that.
>
> We apply some techniques to optimize some scenarios. E.g. VCPU yielding,
> paravirtualized spinlocks etc, to avoid a guest VCPU to waste CPU cycles
> waiting for conditions that require other VCPUs to run first.
>
>> allocated a logical core to run on) within a specified time.  I expect
>> KVM to use the completely fair scheduler (CFS) and a variation of
>> co-scheduling/gang scheduling, but I've been unable to discern whether
>> this is true (it was mentioned in a lwn.net article in 2011, but
>> hasn't been expanded upon since https://lwn.net/Articles/472797/).
>>
>> I've discovered ESXi approaches this with relaxed co-scheduling
>> https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf
>> (pg. 7).
>>
>> I've also discovered a similar email discussion directed at a
>> different mailing list (which suggested to mail this one),
>> https://lists.centos.org/pipermail/centos-virt/2010-November/002214.html
>>
>> For context on my end, I am operating two virtual machine 'stacks' in
>> an a heavily oversubscribed OpenStack KVM cloud environment.  Each
>> 'stack' consists of two virtual machines.  The first generates network
>> traffic (a 'traffic generator') and sends this traffic through two
>> separate interfaces to corresponding networks.  The second virtual
>> machine acts as a bridge for these networks.  A rudimentary diagram is
>> shown below.
>>
>> .----[traffic generator]----.
>> |                                         |
>> '--------[VM bridge]--------'
>>
>> Interestingly;
>> *  When the VM bridge is provisioned with 2 vCPUs, the traffic
>> generator reports ~ 10% packet loss
>> *  When the VM bridge is provisioned with 4 vCPUs, the traffic
>> generator reports ~ 40% packet loss
>>
>> I suspect the packet loss originates from the virtual interface buffer
>> overflow.  To the best of my understanding, although the completely
>> fair scheduler would schedule the VMs for equivalent durations, the
>> 2vCPU VM will be scheduled more frequently (for shorter periods)
>> because it is easier for the scheduler to find and allocate 2vCPUs
>> than 4vCPUs.  This will allow the buffers to be emptied more regularly
>> which results in less packet loss.  However, in order to
>> prove/disprove this theory, I'd need to know how the completely fair
>> scheduler handles co-scheduling in the context of KVM virtual
>> machines.
>
> I hope somebody else with more insight can answer that. But in general,
> to avoid package drops you might want to look into (v)CPU pinning / KVM
> RT. But this will require to have *at least* the number of logical cores
> as you have virtual CPUs.
>
>>
>> Thank you kindly,
>> William
>>
>
>
> --
>
> Thanks,
>
> David / dhildenb