On Tue, Jan 19, 2016 at 12:31:48PM +0100, Kashyap Chamarthy wrote: > On Mon, Jan 18, 2016 at 04:19:58PM +0000, Richard W.M. Jones wrote: > > On Mon, Jan 18, 2016 at 03:33:25PM +0000, Richard W.M. Jones wrote: > > > I tried another workaround which was to get virt-resize to fsync the > > > output file before closing the libvirt connection, but that doesn't > > > work for reasons I don't understand so far - still studying this. > > > > I worked out what was happening here -- I'd inserted the fsync at the > > wrong place in virt-resize. So I have now successfully worked around > > this for the virt-resize case, however it's still a problem that could > > manifest itself in other uses of libvirt + qemu + slow devices. > > We've seen the "Failed to terminate process 1275 with SIGTERM: Device or > resource busy" error occur in context of OpenStack as well[1][2]. > > The behavior is from virDomainDestroy() API (src/libvirt-domain.c): > > [...] > * virDomainDestroy first requests that a guest terminate (e.g. > * SIGTERM), then waits for it to comply. After a reasonable timeout, > * if the guest still exists, virDomainDestroy will forcefully > * terminate the guest (e.g. SIGKILL) if necessary (which may produce > * undesirable results, for example unflushed disk cache in the > * guest). To avoid this possibility, it's recommended to instead > * call virDomainDestroyFlags, sending the > * VIR_DOMAIN_DESTROY_GRACEFUL flag. > [...] > > Dan Berrange explains[1]: > > There are two reasons why you'd get this failure ("Failed to terminate > process: Device or resource busy") from libvirt. > > - The host is so overloaded that the kernel was not able to clean up > the process in the time that libvirt was prepared to wait. If this > is the case, the process should eventually go away on its own > after a short while longer and everything should return to normal > > - There is some problem, causing the process to get stuck in an > uninterruptable wait state. This is usually due to something going > wrong in the storage stack, causing some I/O read/write operation > to hang in kernel space. In this case the process will stay around > in the zombie state forever, or until the storage problem is > resolved. Thanks for finding this documentation. The problem with this theory is we are passing the VIR_DOMAIN_DESTROY_GRACEFUL flag, so that would indicate that this flag is buggy. I think what we need is a test case, so here goes. Note you must run these steps as *non-root*. (1) Download the attachment to /var/tmp (2) chmod +x /var/tmp/qemu.sh (3) killall libvirtd ;# kills the session libvirtd (4) LIBGUESTFS_HV=/var/tmp/qemu.sh guestfish -N fs exit -vx You should see at the end of the output: libguestfs: calling virDomainDestroy "guestfs-q94hsiz89t8jp418" flags=VIR_DOMAIN_DESTROY_GRACEFUL [pause of a few seconds] libguestfs: error: could not destroy libvirt domain: Failed to terminate process 11412 with SIGTERM: Device or resource busy [code=38 domain=0] If someone else can reproduce this, then I will file a bug. Rich. > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1205647 -- > nova.virt.libvirt.driver fails to shutdown reboot instance with > error 'Code=38 Error=Failed to terminate process 4260 with SIGKILL: > Device or resource busy' > [2] https://bugs.launchpad.net/nova/+bug/1353939 -- Rescue fails with > 'Failed to terminate process: Device or resource busy' in the n-cpu > log > > -- > /kashyap -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Attachment:
qemu.sh
Description: Bourne shell script
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list