On Sun, Jun 10, 2018 at 12:14:22PM +0100, Radostin Stoyanov wrote: > Hi all, > > This patch series aims to resolve > https://bugzilla.redhat.com/show_bug.cgi?id=1328946 > > For background information about the issue see v1 of this RFC. > https://www.redhat.com/archives/libvir-list/2018-April/msg01270.html > > The current state of this series enables the start of LXC container with NBD > file system and enabled user namespace. > > However, container shutdown causes "kernel BUG at fs/buffer.c:3058!" > https://pastebin.com/raw/y0ycSM0H > > The reason for this is because qemu-nbd process is terminated/killed without > unmounting the container root file system. > > This issue has been reported in [1] and [2]. > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1356110 > [2] http://lkml.iu.edu/hypermail/linux/kernel/1509.3/00027.html This is not really a kernel bug at the end of the day. We have a filesystem backed by NBD block device, and we're killing the NBD block device. So there's nothing the kernel can really do here if there's outstanding I/O pendnig at this time. There is also this BZ reported against libvirt that has more info: https://bugzilla.redhat.com/show_bug.cgi?id=1570902 > As a workaround we could unmount the root file system of container before shutdown. > > For example with: > $ CT_PID=$(pidof libvirt_lxc) > $ sudo nsenter \ > --mount=/proc/$CT_PID/task/$CT_PID/ns/mnt \ > /bin/bash -c "umount /var/run/libvirt/lxc/guest.root/" > > I noticed that we already have the functions lxcContainerUnmountSubtree > and virProcessRunInMountNamespace. > > Any suggestions on how to properly implement this? We can't unmount the filesystem directly because we don't have any process running inside the container's mount namespace at this time. The libvirt_lxc controller is running in a custom mount namespace that is different from what the container has. The first thing we need todo is take qemu-nbd out of the cgroups. This will ensure that it doesn't get killed at the same time as we're killing off all the container PIDs. It will also fix the OOM deadlocks we see when the memory controller prevents qemu-nbd allocating RAM needed to proces I/O. Then, we can kill all processes in the container as normal. Once they are all gone, we know the kernel will have cleaned up the mount namespace. We can thus safely kill qemu-nbd at this point. Ideally qemu-nbd would automatically exit when the last use of /dev/nbdNNN was release (ie when filesystem was unmounted). This is something you can enable for loopback devices, but I'm not sure it works for NBD. THis would be a useful kernel enhancement if someone feels adventurous. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list