2014-08-15 10:50+0200, Martin Kletzander: > On Thu, Aug 14, 2014 at 04:25:05PM +0200, Radim Krčmář wrote: > >Hello, > > > >by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct' > >[1], with 'shares' set to 1024 on every level. This raises two points: > > > >1) Every VM is given an equal amount of CPU time. [2] > > ($CG/machine.slice/*/shares = 1024) > > > > Which means that smaller / less loaded guests are given an advantage. > > > > This is a default with which we do nothing unless the user (or mgmt > app) wants to. (I'd argue that the default is to do nothing at all ;) > What you say is true only when there is no spare time > (the machines need more time than available). Such overcommit is the > problem of the user, I'd say. I don't like that it breaks an assumption that VCPU behaves as a task. (Complicated systems are hard to operate without consistency and our behavior is really punishing for users that don't read everything.) > >2) All VMs combined are given 1024 shares. [3] > > ($CG/machine.slice/shares) > > > > This is a problem even on system without slices (systemd), because > there is /machine/cpu.shares == 1024 anyway. (Thanks, haven't noticed this on my professionally deformed userspace choices.) > Is there a way to > disable hierarchy in this case (to say cpu.shares=-1 for example)? Apart from the obvious "don't create what you don't want", probably not, cpu.shares are clamped by 2 and 2^18. > Because if not, then it has only limited use (we cannot prepare the > hierarchy and just write a number in some file when we want to start > using it). That's a pity, but there are probably less use cases then > hundreds of lines of code that would need to be changed in order to > support this in kernel. And hierarchy imposes performance degradation as well, so developers probably never expected we'd create useless cgroups. (Should be proportional to their depth => having {emulator,vcpu*} by default is counterproductive as well.) Creating the hierarchy on demand is not much harder than writing a value, especially if we do it through libvirt anyway. A version of your proposal would extend cgroups with something like categorization: we could add an "effective control group" variable that allows scheduler code to start at a point higher in the hierarchy. Libvirt could continue doing what it does now and performance would improve without creating too many special cases. I can see the flame on LKML. > > This is made even worse on RHEL7, by sched_autogroup_enabled = 0, so > > every other process in the system is given the same amount of CPU as > > all VMs combined. > > > > But sched_autogroup_enabled = 1 wouldn't make it much better, since it > would group the machines together anyway, right? Yes, it would be just a bit better for VMs, because other processes would be grouped as well. > >It does not seem to be possible to tune shares and get a good general > >behavior, so the best solution I can see is to disable the cpu cgroup > >and let users do it when needed. (Keeping all tasks in $CG/tasks.) > > > > I agree with you that it's not the best default scenario we can do, > and maybe not using cgroups until needed would bring us a good > benefit. That is for cgroups like cpu and blkio only, I think. I haven't delved into other cgroups much, but there is a good question whether we want them :) Does $feature do something useful on top of complicating things? -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list