On Fri, 26 Feb 2016 11:13:13 +0000 "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild wrote: > > virCgroupNewMachine used to add the pidleader to the newly created > > machine cgroup. Do not do this implicit anymore. > > > > Signed-off-by: Henning Schild <henning.schild@xxxxxxxxxxx> > > --- > > src/lxc/lxc_cgroup.c | 11 +++++++++++ > > src/qemu/qemu_cgroup.c | 11 +++++++++++ > > src/util/vircgroup.c | 22 ---------------------- > > 3 files changed, 22 insertions(+), 22 deletions(-) > > NACK to this patch once again. > > This does not actually work as you think it does. > > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c > > index 11f33ab..aef8e8c 100644 > > --- a/src/util/vircgroup.c > > +++ b/src/util/vircgroup.c > > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char *name, > > } > > } > > > > - if (virCgroupAddTask(*group, pidleader) < 0) { > > - virErrorPtr saved = virSaveLastError(); > > - virCgroupRemove(*group); > > - virCgroupFree(group); > > - if (saved) { > > - virSetError(saved); > > - virFreeError(saved); > > - } > > - } > > Just above this we called virSystemdCreateMachine. Systemd will > create the cgroup and add the pidleader to those cgroups. Systemd > may add the pidleader to just the 'systemd' controller, or it may > add the pidleader to *ALL* controllers. We have no way of knowing. > > This virCgroupAddTask call deals with whatever systemd chose not > todo, so we can guarantee consistent behaviour with the pidleader > in all cgroups. > > By removing this you make this method non-deterministic - the pid > may or may not be in the cpu controller now. THis is bad because > it can lead to QEMU/LXC driver code working in some cases but > failing in other cases. > > Furthermore, this existing does not cause any problems for the > scenario you care about. THis cgroup placement is being set > in between the time libvirtd calls fork() and exec(). With your > later patch 5, we ensure that the PID is moved across into the > emulator cgroup, before we call exec(). When we call exec all > memory mappings will be replaced, so QEMU will stil start with > the correct vCPU placement and memory allocation placement. I agree having the task in the wrong cgroup before the exec() seems harmless. But i am not sure all the fiddling with cgroups is indeed harmless and does not cause i.e. kernel work on cores that should be left alone. I have the feeling allowing the task in the parent cgroup is a bad idea, no matter how short the window seems to be. Right now the parent cgroup contains all cpus found in machine.slice, which for pinned VMs is too much. How about we calculate the size of the child cgroups before and make the parent the union of them. Or give the parent the emulator pinning and extend it for the vcpus later. But that might turn out pretty complicated as well, getting the order right with the mix of cpusets and sched_setaffinity(). > Just just drop this patch please. > > Regards, > Daniel -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list