libvirtd not responding to virsh, results in virsh hanging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We've recently run into an issue with libvirt 1.2.17 in the context of an OpenStack deployment.

Occasionally after doing live migrations from a compute node with libvirt 1.2.17 to a compute node with libvirt 2.0.0 we see libvirtd on the 1.2.17 side stop responding. When this happens, if you run a command like "sudo virsh list" then it just hangs waiting for a response from libvirtd.

Running "ps -elfT|grep libvirtd" shows many threads waiting on a futex, but two threads in poll_schedule_timeout() as part of the poll() syscall. On a non-hung libvirtd I only see one thread in poll_schedule_timeout().

If I kill and restart libvirtd (this took two tries, it didn't actually die the first time) then the problem seems to go away.

I just tried attaching gdb to the "hung" libvirtd process and running "thread apply all backtrace". This printed backtraces for the threads, including the one that was apparently stuck in poll():

Thread 17 (Thread 0x7f0573fff700 (LWP 186865)):
#0  0x00007f05b59d769d in poll () from /lib64/libc.so.6
#1  0x00007f05b7f01b9a in virNetClientIOEventLoop () from /lib64/libvirt.so.0
#2  0x00007f05b7f0234b in virNetClientSendInternal () from /lib64/libvirt.so.0
#3  0x00007f05b7f036f3 in virNetClientSendWithReply () from /lib64/libvirt.so.0
#4  0x00007f05b7f04eb3 in virNetClientStreamSendPacket () from /lib64/libvirt.so.0
#5  0x00007f05b7ed8db5 in remoteStreamFinish () from /lib64/libvirt.so.0
#6  0x00007f05b7ec7eaa in virStreamFinish () from /lib64/libvirt.so.0
#7 0x00007f059bd9323d in qemuMigrationIOFunc () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f05b7e09aa2 in virThreadHelper () from /lib64/libvirt.so.0
#9  0x00007f05b5cb4dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f05b59e1ced in clone () from /lib64/libc.so.6


Interestingly, when I hit "c" to continue in the debugger, I got this:

(gdb) c
Continuing.

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7f0573fff700 (LWP 186865)]
0x00007f05b5cbb1cd in write () from /lib64/libpthread.so.0
(gdb) c
Continuing.
[Thread 0x7f0573fff700 (LWP 186865) exited]
(gdb) quit
A debugging session is active.

        Inferior 1 [process 37471] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/sbin/libvirtd, process 37471


Now thread 186865 seems to be gone, and libvirtd is no longer hung.

Has anyone seen anything like this before? Anyone have an idea where to start looking?

Thanks,
Chris


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]
  Powered by Linux