On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59@xxxxxxxxxx wrote: > Hi Daniel, > > Thanks a lot for your review and reply! > > > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote: > > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote: > > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote: > > > >> From: Li XueLei <li.xuelei1@xxxxxxxxxx> > > > >> > > > >> Libvirtd no longer receives external requests, after I do the following. > > > >> > > ..... > > > > > > > > > Libvirt allows multiple connections concurrently and changes made by one > > > > connection are supposed to apply globally. No per-VM state should be tied > > > > to a particular connection. > > > > > > This is generally very true, except for migration. > > > > Hmm, so in that case we do need to make sure this runs from a non-main > > event thread as this patch does. I'm surprised we've not seen this > > before though - i'd think it should be a guaranteed deadlock anytime > > someone tests this scenario. > > > > > > > > > > > > > IOW, I don't think we need a thread. We should just stop calling > > > > qemuMigrationSrcCleanup from the connection close callback entirely > > > > > > But I agree that something fishy is going on and this doesn't look like > > > proper solution. Yi, can you please share the backtrace of other threads > > > too? Why aren't we getting any reply on the monitor? > > > > qemuMonitorSend() just puts date onto the TX queue & waits for the > > main event loop to send it. We're invoking qemuMonitorSend from > > the main event loop in this backtrace, hence it'll wait forever > > for the thread to send it. > > > > qemuMonitorSend must never be invoked from the main event thread. > > So do you think how to imporve this patch? Any sugesstion is appreciated :-) I'm facing a related problem in my patches for providing an "embedded" QEMU driver - any libvirt API calls that the application makes from its main event loop will deadlock if they result in a QEMU monitor command. We've got another long standing scalability bug too when dealing with shutdown takes too long, the main event loop is blocked for an unreasonably long time. I think the solution to all of these problems is to stop using the main event loop for the QEMU monitor and agent. Instead, we should create a new thread for each QEMU VM, and that thread should run an event loop, handling just the monitor and agent work for that one VM. We can do this quite easily now if we use GLib's GMainContext as the event loop. We'll create a GMainContext and store it in qemuDomainObjPrivatePtr. In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a thread running this GMainContext as an event loop. At the end of qemuProcess we'll need to tell this event loop to quit and stop the thread. Then the qemu_agent.c and qemu_monitor.c code need changing to use the g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of virEventAddHandle/virEventRemoveHandle. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list