On Tue, Aug 17, 2010 at 05:00:22PM +0100, Daniel P. Berrange wrote: > For > > https://bugzilla.redhat.com/show_bug.cgi?id=620847 > > We have had sporadic reports of > > # virsh capabilities > error: failed to get capabilities > error: server closed connection: > > This normally means that libvirtd has crashed, closing the connection > but in this case libvirtd has always remained running. It turns out > that the capabilities XML was too large for the remote RPC message > size. This caused XDR serialization to fail. This caused libvirtd to > close the client connection immediately. The cause of the large XML > was node handling an edge case in libnuma where it returns a CPU mask > of all-1s to indicate a non-existant node. > > Machines that exhibit this problem will show this as a symptom in > the logs > > # grep NUMA /var/log/messages > Aug 16 10:30:34 sgi-xe270-01 libvirtd: 10:30:34.933: warning : > nodeCapsInitNUMA:388 : NUMA topology for cell 1 of 2 not available, ignoring > > And have sparse NUMA topology (ie empty nodes) > > This series does many things: > > - Adds explicit warnings in places where XDR serialization fails, > so we see an indication of problem in /var/log/messages > - Try to send a real remote_error back to client, instead of > closing its connection > - Add logging of capabilities XML in libvirt.c so we can identify > the too large doc in libvirtd > - Add fix to cope with all-1s node mask > > This may also fix some other unexplained bug reports we've had with > 'server closed connection' messages, or at least make it possible > to diagnose them ACK for the 4 patches, as for upstream, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@xxxxxxxxxxxx | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list