Re: Understanding the Linux KVM CPU Scheduler

William Scott <wscottcanada@xxxxxxxxx> · Fri, 8 Jun 2018 10:26:01 -0400

I understand this isn't a simple question and will (likely) require
specialized knowledge.  Is this the correct mailing group?

On Tue, May 22, 2018 at 2:25 PM, William Scott <wscottcanada@xxxxxxxxx> wrote:
>> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated
>> that way by the Linux scheduler. So, whatever scheduler policy you
>> configure for these threads, it will be used in the kernel.
>
> Thank you for answering David, this confirms my current understanding.
> KVM/QEMU sources the underlying CPU scheduler to avoid re-inventing
> the wheel.  Whatever the base OS uses, KVM/QEMU will use as well.  I'm
> interested in whether the underlying CPU scheduler (in this case CFS,
> but it could be any scheduler as you've stated) attempts co-scheduling
> or similar mechanism for virtual machines.  It would make sense that
> there would be, as operating systems tend to assume simultaneous
> access to cores, but so far I haven't found anything to support this.
>
>> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be
>> s such a thing as "synchronous progress". This is even true when having
>> 2 virtual cpus on 3 logical cpus. At least not in default scenarios.
>> There will be other processes/threads to schedule.
>
> I should have phrased this better.  In this example, the host has
> access to at least 24 cores but only has 3 cores available to the
> guest VM at this 'instance' in time (as the others have been allocated
> to other VMs).  My apologies for not adequately conveying my intent.
>
>> I hope somebody else with more insight can answer that. But in general,
>> to avoid package drops you might want to look into (v)CPU pinning / KVM
>> RT. But this will require to have *at least* the number of logical cores
>> as you have virtual CPUs.
>
> You're exactly right, and we do this in our latency sensitive
> environments for a performance boost (in addition to cpu isolation,
> PCI/SRIOV passthrough, disabling c/p states, and a few other tweaks).
> However this approach isn't possible with oversubscription, the
> primary draw to virtualizing the environment.
>
> On Tue, May 22, 2018 at 9:38 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>> On 18.05.2018 19:22, William Scott wrote:
>>> Greetings!
>>>
>>> I'm encountering difficulty understanding the Linux CPU Scheduler
>>> within the context of KVM virtual machines.  Specifically, I'd like to
>>> understand if/how groups of logical cores are allocated to virtual
>>> machines in an oversubscribed environment.
>>
>> Hi,
>>
>> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated
>> that way by the Linux scheduler. So, whatever scheduler policy you
>> configure for these threads, it will be used in the kernel.
>>
>>>
>>> At a high level, my question is "how does the scheduler handle
>>> allocation of logical cores to a VM that is provisioned more cores
>>> than is currently available? E.g., the host has 3 logical cores
>>> available but a VM is provisioned with 8 vCPUs."  I'm predominantly
>>> concerned with the guest operating system not observing synchronous
>>> progress across all vCPUs and potential related errors e.g. a watchdog
>>> timer might expect a response from a sibling vCPU (which was not
>>
>> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be
>> s such a thing as "synchronous progress". This is even true when having
>> 2 virtual cpus on 3 logical cpus. At least not in default scenarios.
>> There will be other processes/threads to schedule.
>>
>> Now, let's assume watchdogs are usually ~30seconds. So if you're
>> hypervisor is heavily overloaded, it can of course happen that a
>> watchdog strikes, or rather some RCU deadlock prevention in your guest
>> operating system will trigger before that.
>>
>> We apply some techniques to optimize some scenarios. E.g. VCPU yielding,
>> paravirtualized spinlocks etc, to avoid a guest VCPU to waste CPU cycles
>> waiting for conditions that require other VCPUs to run first.
>>
>>> allocated a logical core to run on) within a specified time.  I expect
>>> KVM to use the completely fair scheduler (CFS) and a variation of
>>> co-scheduling/gang scheduling, but I've been unable to discern whether
>>> this is true (it was mentioned in a lwn.net article in 2011, but
>>> hasn't been expanded upon since https://lwn.net/Articles/472797/).
>>>
>>> I've discovered ESXi approaches this with relaxed co-scheduling
>>> https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf
>>> (pg. 7).
>>>
>>> I've also discovered a similar email discussion directed at a
>>> different mailing list (which suggested to mail this one),
>>> https://lists.centos.org/pipermail/centos-virt/2010-November/002214.html
>>>
>>> For context on my end, I am operating two virtual machine 'stacks' in
>>> an a heavily oversubscribed OpenStack KVM cloud environment.  Each
>>> 'stack' consists of two virtual machines.  The first generates network
>>> traffic (a 'traffic generator') and sends this traffic through two
>>> separate interfaces to corresponding networks.  The second virtual
>>> machine acts as a bridge for these networks.  A rudimentary diagram is
>>> shown below.
>>>
>>> .----[traffic generator]----.
>>> |                                         |
>>> '--------[VM bridge]--------'
>>>
>>> Interestingly;
>>> *  When the VM bridge is provisioned with 2 vCPUs, the traffic
>>> generator reports ~ 10% packet loss
>>> *  When the VM bridge is provisioned with 4 vCPUs, the traffic
>>> generator reports ~ 40% packet loss
>>>
>>> I suspect the packet loss originates from the virtual interface buffer
>>> overflow.  To the best of my understanding, although the completely
>>> fair scheduler would schedule the VMs for equivalent durations, the
>>> 2vCPU VM will be scheduled more frequently (for shorter periods)
>>> because it is easier for the scheduler to find and allocate 2vCPUs
>>> than 4vCPUs.  This will allow the buffers to be emptied more regularly
>>> which results in less packet loss.  However, in order to
>>> prove/disprove this theory, I'd need to know how the completely fair
>>> scheduler handles co-scheduling in the context of KVM virtual
>>> machines.
>>
>> I hope somebody else with more insight can answer that. But in general,
>> to avoid package drops you might want to look into (v)CPU pinning / KVM
>> RT. But this will require to have *at least* the number of logical cores
>> as you have virtual CPUs.
>>
>>>
>>> Thank you kindly,
>>> William
>>>
>>
>>
>> --
>>
>> Thanks,
>>
>> David / dhildenb