Re: virsh reconnect vs. keepalive issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 20, 2014 at 05:49:54PM -0600, Eric Blake wrote:
I'm trying to debug a new virsh command, and had two terminals up, one
running './run gdb tools/virsh' (I originally ran virsh directly, but
re-ran under gdb to get a trace) and the other running './run gdb
daemon/libvirtd'.  Because I forgot to use the -k0 argument to disable
keepalives, and took too long while at a breakpoint on the libvirtd
terminal, my virsh session tried to report a keepalive failure:

[New Thread 0x7fffe7b8c700 (LWP 7143)]
2014-05-20 23:40:22.337+0000: 7138: info : libvirt version: 1.2.5
2014-05-20 23:40:22.337+0000: 7138: warning :
virKeepAliveTimerInternal:143 : No response from client 0x55555586a520
after 6 keepalive messages in 35 seconds
2014-05-20 23:40:22.337+0000: 7143: warning :
virKeepAliveTimerInternal:143 : No response from client 0x55555586a520
after 6 keepalive messages in 35 seconds
error: internal error: received hangup / error event on socket


I tried reproducing that and I get no hang, plus the only difference
is that without -k0 I *sometimes* get:

warning : virKeepAliveTimer:177 : Failed to send keepalive request to
client 0x55555583d630

Other than that, everythign works fine and I only see the reconnection
happen after another command is called.  Are you using some kind of
special authentication?

but then it just hangs there.  A backtrace shows that it is hanging
trying to reconnect:

(gdb) bt
#0  0x00007ffff39129dd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff798e3c1 in virNetClientIOEventLoop (client=0x55555586ae10,
   thiscall=0x55555586b090) at rpc/virnetclient.c:1514
#2  0x00007ffff798ecbf in virNetClientIO (client=0x55555586ae10,
   thiscall=0x55555586b090) at rpc/virnetclient.c:1786
#3  0x00007ffff798f5ab in virNetClientSendInternal (client=0x55555586ae10,
   msg=0x55555586ada0, expectReply=true, nonBlock=false)
   at rpc/virnetclient.c:1958
#4  0x00007ffff798f648 in virNetClientSendWithReply (client=0x55555586ae10,
   msg=0x55555586ada0) at rpc/virnetclient.c:1986
#5  0x00007ffff7990580 in virNetClientProgramCall (prog=0x55555586afd0,
   client=0x55555586ae10, serial=0, proc=66, noutfds=0, outfds=0x0,
   ninfds=0x0, infds=0x0, args_filter=0x7ffff394f270 <__GI_xdr_void>,
   args=0x0, ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
   ret=0x7fffffffd6f0) at rpc/virnetclientprogram.c:329
#6  0x00007ffff7975d4c in callFull (conn=0x555555869fb0,
priv=0x55555586ab30,
   flags=0, fdin=0x0, fdinlen=0, fdout=0x0, fdoutlen=0x0, proc_nr=66,
   args_filter=0x7ffff394f270 <__GI_xdr_void>, args=0x0,
   ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
   ret=0x7fffffffd6f0 "") at remote/remote_driver.c:6544
#7  0x00007ffff7975e25 in call (conn=0x555555869fb0, priv=0x55555586ab30,
   flags=0, proc_nr=66, args_filter=0x7ffff394f270 <__GI_xdr_void>,
args=0x0,
   ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
---Type <return> to continue, or q <return> to quit---
   ret=0x7fffffffd6f0 "") at remote/remote_driver.c:6566
#8  0x00007ffff795f7f1 in remoteAuthenticate (conn=0x555555869fb0,
   priv=0x55555586ab30, auth=0x7ffff7d71520 <virConnectAuthDefault>,
   authtype=0x0) at remote/remote_driver.c:3939
#9  0x00007ffff7957f0a in doRemoteOpen (conn=0x555555869fb0,
   priv=0x55555586ab30, auth=0x7ffff7d71520 <virConnectAuthDefault>,
flags=0)
   at remote/remote_driver.c:973
#10 0x00007ffff795890b in remoteConnectOpen (conn=0x555555869fb0,
   auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0)
   at remote/remote_driver.c:1190
#11 0x00007ffff78fc5e5 in do_open (name=0x0,
   auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0) at libvirt.c:1147
#12 0x00007ffff78fd093 in virConnectOpenAuth (name=0x0,
   auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0) at libvirt.c:1390
#13 0x0000555555575a67 in vshConnect (ctl=0x7fffffffdbc0, uri=0x0,
   readonly=false) at virsh.c:337
#14 0x0000555555575c23 in vshReconnect (ctl=0x7fffffffdbc0) at virsh.c:383
#15 0x000055555557943a in vshCommandRun (ctl=0x7fffffffdbc0,
   cmd=0x5555557f4900) at virsh.c:1853
#16 0x000055555557d8de in main (argc=5, argv=0x7fffffffdda8) at virsh.c:3617

It seems odd that virsh is hanging rather than gracefully exit, because
it is trying too hard to reconnect in spite of being forcefully
disconnected due to a keepalive timeout.  But I'm not sure what, if
anything, we could tweak to make this failure mode not require a Ctrl-C
on my end to get rid of the failed virsh.  It also reiterates the idea
that we really ought to allow for a user-specified timeout to
virConnect, rather than blocking forever.

--
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org




--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

Attachment: signature.asc
Description: Digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]