Jim Fehlig writes ("[PATCH RFC] libxl: use libxl_event_wait to process libxl events"): > Prior to this patch, libxl events were delivered to libvirt via > the libxlDomainEventHandler callback registered with libxl. > Documenation in $xensrc/tools/libxl/libxl_event.h states that the > callback "may occur on any thread in which the application calls > libxl". This can result in deadlock since many of the libvirt > callees of libxl hold a lock on the virDomainObj they are working > on. When the callback is invoked, it attempts to find a virDomainObj > corresponding to the domain ID provided by libxl. Searching the > domain obj list results in locking each obj before checking if it is > active, and its ID equals the requested ID. Deadlock is possible > when attempting to lock an obj that is already locked further up > the call stack. Indeed, Max Ustermann recently reported an instance > of this deadlock This sounds like a very plausible failure mode. > https://www.redhat.com/archives/libvir-list/2015-November/msg00130.html > > This patch moves processing of libxl events to a thread, where > libxl_event_wait() is used to collect events. This allows processing > libxl events asynchronously in libvirt, avoiding the deadlock. The reasoning is sound and the remedy is IMO correct. (However, you mean "this allows processing libxl events _synchronously_", since what you are doing is serialising them all into your main loop.) So: Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> NB that I have not reviewed the patch in detail. I can do that if you like, although of course my knowledge of the innards of libvirt is not wonderful. > The only reservations I have about this patch come from comments > about libxl_event_wait in libxl_event.h > > Like libxl_event_check but blocks if no suitable events are > available, until some are. Uses libxl_osevent_beforepoll/ > _afterpoll so may be inefficient if very many domains are being > handled by a single program. If this turns out to be a problem in practice, we will improve libxl's data structures to not involve so many linear searches. (In fact I think you are probably already calling synchronous libxl ao functions, which have the same performance properties, although this is not documented.) Given what you say above I don't think there is a reasonable alternative remedy. So you should go ahead with this patch. Regards, Ian. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list