Hi Cedric,x On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote: > Hey Christian, > > On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote: > > Hi, > > I found an issue in libvirt related to libvirt-lxc, but fail to find the root cause. > > > > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to "internal error: No valid cgroup for machine" > > > > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu and Debian. > > I wanted to ask for two things: > > - wider coverage where this does reproduce > > I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5 packages. I had a short look and it seems like this sequence is killing all running libvirt-lxc guests reliably: # no lxc guest running yet export LIBVIRT_DEFAULT_URI=lxc:/// DOMAIN=sl systemctl daemon-reload # start lxc guest virsh start ${DOMAIN} sleep 1 # give vm some time to start systemctl restart libvirtd virsh list | grep -qs "${DOMAIN}[[:space:]]\+running" # lxc guest gone The important part is the "systemctl daemon-reload". If one leaves that out libvirtd restarts don't kill off any lxc-domains anymore. The issue is that libvirt on reattach fails virCgroupNewDetectMachine due to /proc/<pid-of-lxc-container>/cgroup having changed after libvird's restart: Before systemctl restarts libvirtd: 10:perf_event:/machine/lxc-21383-sl.libvirt-lxc 9:cpuset:/machine/lxc-21383-sl.libvirt-lxc 8:net_cls,net_prio:/machine/lxc-21383-sl.libvirt-lxc 7:pids:/system.slice/libvirtd.service 6:memory:/machine/lxc-21383-sl.libvirt-lxc 5:cpu,cpuacct:/machine/lxc-21383-sl.libvirt-lxc 4:devices:/machine/lxc-21383-sl.libvirt-lxc 3:freezer:/machine/lxc-21383-sl.libvirt-lxc 2:blkio:/machine/lxc-21383-sl.libvirt-lxc 1:name=systemd:/system.slice/libvirtd.service After systemctl restart libvirtd: 10:perf_event:/machine/lxc-21383-sl.libvirt-lxc 9:cpuset:/machine/lxc-21383-sl.libvirt-lxc 8:net_cls,net_prio:/machine/lxc-21383-sl.libvirt-lxc 7:pids:/system.slice/libvirtd.service 6:memory:/system.slice/libvirtd.service 5:cpu,cpuacct:/system.slice/libvirtd.service 4:devices:/system.slice/libvirtd.service 3:freezer:/machine/lxc-21383-sl.libvirt-lxc 2:blkio:/system.slice/libvirtd.service 1:name=systemd:/system.slice/libvirtd.service so the process is moved to other memory, cpu, device and blkio cgroups and therefore libvirtd can't find it anymore. The error in the log looks like: debug : virCgroupValidateMachineGroup:333 : Name 'libvirtd.service' for controller 'cpu' does not match 'sl', 'lxc-21383-sl', 'sl.libvirt-lxc', 'machine-lxc\x2dsl.scope' or 'machine-lxc\x2d21383\x2dsl.scope' This does _not_ happen if one restarts libvirtd right after the "systemctl daemon-reload" or if one drops the "systemctl daemon-reload" from the above example. This also does not happen if one stops libvird via systemd but starts it as /usr/sbin/libvirtd directly. So the culprit happens when * systemctl daemon-reload * libvirtd is restared via systemctl I've looked at audit logs and straced pid 1 without spotting anything. Any ideas where to go looking now? This is systemd 232. Cheers, -- Guido > > > - your expertise on the case itself. > > It seems that you'll need to check what's going on in virCgroupDetect(). > > > Steps to reproduce: > > 1. Spawn new KVM Guest of your choice > > 2. install test dependencies > > $ apt-get install libvirt-daemon-system libvirt-clients libxml2-utils > > # or package managers / package names of your chosen os > > 3. run the following sequence as root > > export LIBVIRT_DEFAULT_URI=lxc:/// > > cat << EOF > /tmp/smoke-lxc.xml > > <domain type='lxc'> > > <name>sl</name> > > <memory unit='KiB'>256000</memory> > > <currentMemory unit='KiB'>256000</currentMemory> > > <vcpu placement='static'>1</vcpu> > > <os> > > <type>exe</type> > > <init>/bin/bash</init> > > </os> > > <features> > > <privnet/> > > </features> > > <clock offset='utc'/> > > <devices> > > <emulator>/usr/lib/libvirt/libvirt_lxc</emulator> > > The emulator should be removed from the config for portability > purpose: the libvirt_lxc path may vary from a distro / arch to another > and libvirt's lxc driver is able to auto-add it. > > -- > Cedric > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list