On Sun, Dec 25, 2016 at 12:21:18AM +0100, Guido Günther wrote: > On Sat, Dec 24, 2016 at 05:14:44PM +0100, Guido Günther wrote: > > Hi Cedric,x > > On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote: > > > Hey Christian, > > > > > > On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote: > > > > Hi, > > > > I found an issue in libvirt related to libvirt-lxc, but fail to find the root cause. > > > > > > > > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to "internal error: No valid cgroup for machine" > > > > > > > > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu and Debian. > > > > I wanted to ask for two things: > > > > - wider coverage where this does reproduce > > > > > > I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5 packages. > > > > I had a short look and it seems like this sequence is killing all running > > libvirt-lxc guests reliably: > > > > # no lxc guest running yet > > export LIBVIRT_DEFAULT_URI=lxc:/// > > DOMAIN=sl > > systemctl daemon-reload > > > > # start lxc guest > > virsh start ${DOMAIN} > > sleep 1 # give vm some time to start > > systemctl restart libvirtd > > Using ftrae I can see that systemd moves the process into the wrong > cgroup on start: > > systemd-1 [000] .... 652.333068: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > systemd-1 [000] .... 652.333117: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > systemd-1 [000] .... 652.333160: cgroup_attach_task: dst_root=6 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > systemd-1 [000] .... 652.333203: cgroup_attach_task: dst_root=4 dst_id=107 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > systemd-1 [000] .... 652.333245: cgroup_attach_task: dst_root=8 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > systemd-1 [000] .... 652.333286: cgroup_attach_task: dst_root=7 dst_id=84 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc > > I've attached the script to reproduce this and would be happy about > ideas of the root cause. Ok, so when libvirt starts an LXC guest, it creates a machine slice with system to hold the container processes. The machine slice has the container PID 1 as its leader, but libvirt also adds the libvirt_lxc controller and and any qemu-nbd processes to the cgroups assoicated with this machine slice..... except it only does this for resource cgroups its using and does *not* do this for the systemd cgroup. So if you query libvirtd.service status, it'll show libvirt_lxc being associated with that, instead of the machine slice # systemctl status libvirtd.service ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2017-01-05 10:38:02 GMT; 10s ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 6723 (libvirtd) Tasks: 20 (limit: 4915) CGroup: /system.slice/libvirtd.service ├─1547 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper ├─1548 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper ├─6723 /usr/sbin/libvirtd --listen └─6888 /usr/libexec/libvirt_lxc --name sl --console 25 --security=selinux --handshake 28 # systemctl status machine-lxc\\x2d6888\\x2dsl.scope ● machine-lxc\x2d6888\x2dsl.scope - Container lxc-6888-sl Loaded: loaded (/run/systemd/transient/machine-lxc\x2d6888\x2dsl.scope; transient; vendor preset: disabled) Transient: yes Active: active (running) since Thu 2017-01-05 10:38:04 GMT; 13s ago Tasks: 1 (limit: 16384) Memory: 812.0K CPU: 25ms CGroup: /machine.slice/machine-lxc\x2d6888\x2dsl.scope └─6889 /bin/bash Now, when you do a restart of libvirtd.service, systemd will ensure that all the processes associated with that service are in the right cgroups, moving them if needed. systemd only refreshes its view of cgroup placement when you do a daemon-reload. Hence it only notices that libvirt moved libvirt_lxc after doing a daemon-reload. Anyway, systemd moves libvirt_lxc back into the cgroups associated with libvirtd.service. I think to fix this, we will need to ensure that we move libvirt_lxc into the machine slice for the systemd cgroup controller too. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list