On Fri, Mar 01, 2013 at 12:31:34AM +0800, Daniel Veillard wrote: > On Thu, Feb 28, 2013 at 04:24:17PM +0000, Daniel P. Berrange wrote: > > On Thu, Feb 28, 2013 at 04:16:37PM +0000, Daniel P. Berrange wrote: > [...] > > Oh joy, it is worse than you could possibly imagine. > > > > On libnl1 the return value is a valid -errno, while in libnl3 > > the return value is an error code of its own invention. > > > > Further in libnl1 we can';t rely on the global errno, because > > other calls libnl does may have overritten it with garbage. > > We must use the return value from the function. > > > > For yet more fun, libnl1's error handling is not threadsafe. > > Whenever it hits an error with a syscall, it internally > > runs __nl_error() which mallocs/frees a static global > > variable containing the contents of strerror() which is > > itself also not threadsafe :-( > > > > Did I mention we should just throw out all versions of libnl > > entirely and talk to the kernel ourselves..... It has caused > > us no end of pain in all its versions. > > No chance of educating them instead ? We can't rewrite everything :) Sure, it has been getting better over time, but that doesn't help us for all existing distros, particular rhel-5 and rhel-6 which libvirt is going to be crash-prone due to unsolvable libnl design flaws in those versions. Looking at the code there are two basic sets of APIs we rely on nl_XXXX nla_XXXX The nl_XXX APis are basically just wrappers around the normal socket() based APIs, hiding a few bits about the AF_NETLINK socket type. It would be trivially to do all that work ourselves, since socket() handling is nothing special. These are the APIs which have caused us multiple thread safety crash problems over the years. The nla_XXX APIs are all about complex data formatting, and we wouldn't want to try todo that ourselves. Fortunately the nla_XXX APIs are not the ones that are causing us trouble - AFAICT those look pretty safe in what they do fro a thread POV, since they're all just working on the object instances you pass in, no global state. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list