I understand this isn't a simple question and will (likely) require specialized knowledge. Is this the correct mailing group? On Tue, May 22, 2018 at 2:25 PM, William Scott <wscottcanada@xxxxxxxxx> wrote: >> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated >> that way by the Linux scheduler. So, whatever scheduler policy you >> configure for these threads, it will be used in the kernel. > > Thank you for answering David, this confirms my current understanding. > KVM/QEMU sources the underlying CPU scheduler to avoid re-inventing > the wheel. Whatever the base OS uses, KVM/QEMU will use as well. I'm > interested in whether the underlying CPU scheduler (in this case CFS, > but it could be any scheduler as you've stated) attempts co-scheduling > or similar mechanism for virtual machines. It would make sense that > there would be, as operating systems tend to assume simultaneous > access to cores, but so far I haven't found anything to support this. > >> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be >> s such a thing as "synchronous progress". This is even true when having >> 2 virtual cpus on 3 logical cpus. At least not in default scenarios. >> There will be other processes/threads to schedule. > > I should have phrased this better. In this example, the host has > access to at least 24 cores but only has 3 cores available to the > guest VM at this 'instance' in time (as the others have been allocated > to other VMs). My apologies for not adequately conveying my intent. > >> I hope somebody else with more insight can answer that. But in general, >> to avoid package drops you might want to look into (v)CPU pinning / KVM >> RT. But this will require to have *at least* the number of logical cores >> as you have virtual CPUs. > > You're exactly right, and we do this in our latency sensitive > environments for a performance boost (in addition to cpu isolation, > PCI/SRIOV passthrough, disabling c/p states, and a few other tweaks). > However this approach isn't possible with oversubscription, the > primary draw to virtualizing the environment. > > On Tue, May 22, 2018 at 9:38 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: >> On 18.05.2018 19:22, William Scott wrote: >>> Greetings! >>> >>> I'm encountering difficulty understanding the Linux CPU Scheduler >>> within the context of KVM virtual machines. Specifically, I'd like to >>> understand if/how groups of logical cores are allocated to virtual >>> machines in an oversubscribed environment. >> >> Hi, >> >> I am by far no expert. vCPUs are "just" (e.g. QEMU) threads and treated >> that way by the Linux scheduler. So, whatever scheduler policy you >> configure for these threads, it will be used in the kernel. >> >>> >>> At a high level, my question is "how does the scheduler handle >>> allocation of logical cores to a VM that is provisioned more cores >>> than is currently available? E.g., the host has 3 logical cores >>> available but a VM is provisioned with 8 vCPUs." I'm predominantly >>> concerned with the guest operating system not observing synchronous >>> progress across all vCPUs and potential related errors e.g. a watchdog >>> timer might expect a response from a sibling vCPU (which was not >> >> As you're pushing 8 virtual cpus on 3 logical cpus, there will never be >> s such a thing as "synchronous progress". This is even true when having >> 2 virtual cpus on 3 logical cpus. At least not in default scenarios. >> There will be other processes/threads to schedule. >> >> Now, let's assume watchdogs are usually ~30seconds. So if you're >> hypervisor is heavily overloaded, it can of course happen that a >> watchdog strikes, or rather some RCU deadlock prevention in your guest >> operating system will trigger before that. >> >> We apply some techniques to optimize some scenarios. E.g. VCPU yielding, >> paravirtualized spinlocks etc, to avoid a guest VCPU to waste CPU cycles >> waiting for conditions that require other VCPUs to run first. >> >>> allocated a logical core to run on) within a specified time. I expect >>> KVM to use the completely fair scheduler (CFS) and a variation of >>> co-scheduling/gang scheduling, but I've been unable to discern whether >>> this is true (it was mentioned in a lwn.net article in 2011, but >>> hasn't been expanded upon since https://lwn.net/Articles/472797/). >>> >>> I've discovered ESXi approaches this with relaxed co-scheduling >>> https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf >>> (pg. 7). >>> >>> I've also discovered a similar email discussion directed at a >>> different mailing list (which suggested to mail this one), >>> https://lists.centos.org/pipermail/centos-virt/2010-November/002214.html >>> >>> For context on my end, I am operating two virtual machine 'stacks' in >>> an a heavily oversubscribed OpenStack KVM cloud environment. Each >>> 'stack' consists of two virtual machines. The first generates network >>> traffic (a 'traffic generator') and sends this traffic through two >>> separate interfaces to corresponding networks. The second virtual >>> machine acts as a bridge for these networks. A rudimentary diagram is >>> shown below. >>> >>> .----[traffic generator]----. >>> | | >>> '--------[VM bridge]--------' >>> >>> Interestingly; >>> * When the VM bridge is provisioned with 2 vCPUs, the traffic >>> generator reports ~ 10% packet loss >>> * When the VM bridge is provisioned with 4 vCPUs, the traffic >>> generator reports ~ 40% packet loss >>> >>> I suspect the packet loss originates from the virtual interface buffer >>> overflow. To the best of my understanding, although the completely >>> fair scheduler would schedule the VMs for equivalent durations, the >>> 2vCPU VM will be scheduled more frequently (for shorter periods) >>> because it is easier for the scheduler to find and allocate 2vCPUs >>> than 4vCPUs. This will allow the buffers to be emptied more regularly >>> which results in less packet loss. However, in order to >>> prove/disprove this theory, I'd need to know how the completely fair >>> scheduler handles co-scheduling in the context of KVM virtual >>> machines. >> >> I hope somebody else with more insight can answer that. But in general, >> to avoid package drops you might want to look into (v)CPU pinning / KVM >> RT. But this will require to have *at least* the number of logical cores >> as you have virtual CPUs. >> >>> >>> Thank you kindly, >>> William >>> >> >> >> -- >> >> Thanks, >> >> David / dhildenb