On 11/23/2015 04:24 AM, Ian Jackson wrote: > Jim Fehlig writes ("[PATCH RFC] libxl: use libxl_event_wait to process libxl events"): >> Prior to this patch, libxl events were delivered to libvirt via >> the libxlDomainEventHandler callback registered with libxl. >> Documenation in $xensrc/tools/libxl/libxl_event.h states that the >> callback "may occur on any thread in which the application calls >> libxl". This can result in deadlock since many of the libvirt >> callees of libxl hold a lock on the virDomainObj they are working >> on. When the callback is invoked, it attempts to find a virDomainObj >> corresponding to the domain ID provided by libxl. Searching the >> domain obj list results in locking each obj before checking if it is >> active, and its ID equals the requested ID. Deadlock is possible >> when attempting to lock an obj that is already locked further up >> the call stack. Indeed, Max Ustermann recently reported an instance >> of this deadlock > This sounds like a very plausible failure mode. > >> https://www.redhat.com/archives/libvir-list/2015-November/msg00130.html >> >> This patch moves processing of libxl events to a thread, where >> libxl_event_wait() is used to collect events. This allows processing >> libxl events asynchronously in libvirt, avoiding the deadlock. > The reasoning is sound and the remedy is IMO correct. (However, you > mean "this allows processing libxl events _synchronously_", since what > you are doing is serialising them all into your main loop.) Ah yes, poor choice of words. The last sentence should read: This approach gives libvirt more control over event processing, ensuring object locking constraints can be met. But your comment exposes a flaw in the patch, one that had already been fixed in the event_occurs approach. Shutting down a large memory domain can take considerable time due to memory scrubbing, meanwhile stalling reception of other events. The patch was a bit aggressive in removing libxlDomainShutdownThread(). > > So: > > Acked-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx> > > NB that I have not reviewed the patch in detail. I can do that if you > like, although of course my knowledge of the innards of libvirt is not > wonderful. > >> The only reservations I have about this patch come from comments >> about libxl_event_wait in libxl_event.h >> >> Like libxl_event_check but blocks if no suitable events are >> available, until some are. Uses libxl_osevent_beforepoll/ >> _afterpoll so may be inefficient if very many domains are being >> handled by a single program. > If this turns out to be a problem in practice, we will improve libxl's > data structures to not involve so many linear searches. (In fact I > think you are probably already calling synchronous libxl ao functions, > which have the same performance properties, although this is not > documented.) > > Given what you say above I don't think there is a reasonable > alternative remedy. So you should go ahead with this patch. Thanks for your comments and ACK'ing the change . I'll submit a V2 that retains handling of shutdown event in a thread. Regards, Jim -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list