On Thu, Aug 14, 2014 at 04:25:05PM +0200, Radim Krčmář wrote:
Hello, by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct' [1], with 'shares' set to 1024 on every level. This raises two points: 1) Every VM is given an equal amount of CPU time. [2] ($CG/machine.slice/*/shares = 1024) Which means that smaller / less loaded guests are given an advantage.
This is a default with which we do nothing unless the user (or mgmt app) wants to. What you say is true only when there is no spare time (the machines need more time than available). Such overcommit is the problem of the user, I'd say.
2) All VMs combined are given 1024 shares. [3] ($CG/machine.slice/shares)
This is a problem even on system without slices (systemd), because there is /machine/cpu.shares == 1024 anyway. Is there a way to disable hierarchy in this case (to say cpu.shares=-1 for example)? Because if not, then it has only limited use (we cannot prepare the hierarchy and just write a number in some file when we want to start using it). That's a pity, but there are probably less use cases then hundreds of lines of code that would need to be changed in order to support this in kernel.
This is made even worse on RHEL7, by sched_autogroup_enabled = 0, so every other process in the system is given the same amount of CPU as all VMs combined.
But sched_autogroup_enabled = 1 wouldn't make it much better, since it would group the machines together anyway, right?
It does not seem to be possible to tune shares and get a good general behavior, so the best solution I can see is to disable the cpu cgroup and let users do it when needed. (Keeping all tasks in $CG/tasks.)
I agree with you that it's not the best default scenario we can do, and maybe not using cgroups until needed would bring us a good benefit. That is for cgroups like cpu and blkio only, I think.
Do we want cgroups in the default at all? (Is OpenStack dealing with these quirks?) Thanks. --- 1: machine.slice/ machine-qemu\\x2d${name}.scope/ {emulator,vcpu*}/ 2: To reproduce, run two guests with > 1 VCPU and execute two spinners on the first and one on the second. The result will be 50%/50% CPU assignment between guests; 66%/33% seems more natural, but it could still be considered as a feature. 3: Run a guest with $n VCPUs and $n spinners in it, and $n spinners in the host - RHEL7: 1/($n + 1)% CPU for the guest -- I'd expect 50%/50%. - Upstream: 50%/50% between guest and host because of autogrouping; if you run $n more spinners in the host, it will still be 50%/50%, instead of seemingly more fair 33%/66%. (And you can run spinners from different groups, so it would be the same as in RHEL7 then.) And it also works the other way: if the host has $n CPUs, then $n/2 tasks in the host suffice to minimize VMs' performance, regardless of the amount of running VCPUs. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list