On Fri, Feb 26, 2016 at 01:16:15PM +0100, Henning Schild wrote: > On Fri, 26 Feb 2016 11:13:13 +0000 > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > > > On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild wrote: > > > virCgroupNewMachine used to add the pidleader to the newly created > > > machine cgroup. Do not do this implicit anymore. > > > > > > Signed-off-by: Henning Schild <henning.schild@xxxxxxxxxxx> > > > --- > > > src/lxc/lxc_cgroup.c | 11 +++++++++++ > > > src/qemu/qemu_cgroup.c | 11 +++++++++++ > > > src/util/vircgroup.c | 22 ---------------------- > > > 3 files changed, 22 insertions(+), 22 deletions(-) > > > > NACK to this patch once again. > > > > This does not actually work as you think it does. > > > > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c > > > index 11f33ab..aef8e8c 100644 > > > --- a/src/util/vircgroup.c > > > +++ b/src/util/vircgroup.c > > > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char *name, > > > } > > > } > > > > > > - if (virCgroupAddTask(*group, pidleader) < 0) { > > > - virErrorPtr saved = virSaveLastError(); > > > - virCgroupRemove(*group); > > > - virCgroupFree(group); > > > - if (saved) { > > > - virSetError(saved); > > > - virFreeError(saved); > > > - } > > > - } > > > > Just above this we called virSystemdCreateMachine. Systemd will > > create the cgroup and add the pidleader to those cgroups. Systemd > > may add the pidleader to just the 'systemd' controller, or it may > > add the pidleader to *ALL* controllers. We have no way of knowing. > > > > This virCgroupAddTask call deals with whatever systemd chose not > > todo, so we can guarantee consistent behaviour with the pidleader > > in all cgroups. > > > > By removing this you make this method non-deterministic - the pid > > may or may not be in the cpu controller now. THis is bad because > > it can lead to QEMU/LXC driver code working in some cases but > > failing in other cases. > > > > Furthermore, this existing does not cause any problems for the > > scenario you care about. THis cgroup placement is being set > > in between the time libvirtd calls fork() and exec(). With your > > later patch 5, we ensure that the PID is moved across into the > > emulator cgroup, before we call exec(). When we call exec all > > memory mappings will be replaced, so QEMU will stil start with > > the correct vCPU placement and memory allocation placement. > > I agree having the task in the wrong cgroup before the exec() seems > harmless. But i am not sure all the fiddling with cgroups is indeed > harmless and does not cause i.e. kernel work on cores that should be > left alone. I have the feeling allowing the task in the parent cgroup > is a bad idea, no matter how short the window seems to be. > > Right now the parent cgroup contains all cpus found in machine.slice, > which for pinned VMs is too much. How about we calculate the size of the > child cgroups before and make the parent the union of them. Or give the > parent the emulator pinning and extend it for the vcpus later. > But that might turn out pretty complicated as well, getting the order > right with the mix of cpusets and sched_setaffinity(). > > > Just just drop this patch please. The point is though that we have *no* choice. Systemd can put the task in the cpu controller and we've no way to prevent that. So the code *has* to be able to cope with that happening. Therefore this patch is wrong it just makes behaviour non-deterministic increasing the chances that we don't correctly handle the case where systemd adds the task to the cpu controllers Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list