On 05/01/2012 01:10 PM, Laine Stump wrote: > This patch is one alternative to solve the problem detailed in: > > https://bugzilla.redhat.com/show_bug.cgi?id=816465 > > Some other unidentified library in use by libvirtd (in another thread) > is apparently temporarily binding to a NETLINK_ROUTE raw socket with > an address of "pid of libvirtd" during startup. This is the same > address used by libnl for the first netlink socket it binds, and the > netlink socket allocated for virNetlinkEventServiceStart() happens to > be that first socket; the result is that nl_connect() fails about > 15-20% of the time (but apparently only if there is a guest running at > the time libvirtd starts). > > Testing has shown that in the case that nl_connect fails the first > time, retrying it after a 500msec sleep leads to success 100% of the > time, so this patch doubles that delay (which also has 100% success > rate. > > +++ b/src/util/virnetlink.c > @@ -355,9 +355,18 @@ virNetlinkEventServiceStart(void) > } > > if (nl_connect(srv->netlinknh, NETLINK_ROUTE) < 0) { > - virReportSystemError(errno, > - "%s", _("cannot connect to netlink socket")); > - goto error_server; > + /* the address that libnl wants to use for this connect ("pid > + * of libvirtd") is sometimes temporarily in use by some other > + * unidentified code. Retrying after a 500msec sleep has > + * achieved 100% success rates, so we sleep for 1000msec and > + * retry. > + */ > + usleep(1000000); Sleeping for 1 entire second is user-visible; if we go with this approach, I'd rather see it be as a retry loop that probes something like once every 200ms for 5 tries (or something similar), for better response time. -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list