On Fri, Jan 03, 2020 at 10:11:22AM +0000, Daniel P. Berrangé wrote: > On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59@xxxxxxxxxx wrote: > > Hi Daniel, > > > > Thanks a lot for your review and reply! > > > > > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote: > > > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote: > > > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote: > > > > >> From: Li XueLei <li.xuelei1@xxxxxxxxxx> > > > > >> > > > > >> Libvirtd no longer receives external requests, after I do the following. > > > > >> > > > > ..... > > > > > > > > > > > > Libvirt allows multiple connections concurrently and changes made by one > > > > > connection are supposed to apply globally. No per-VM state should be tied > > > > > to a particular connection. > > > > > > > > This is generally very true, except for migration. > > > > > > Hmm, so in that case we do need to make sure this runs from a non-main > > > event thread as this patch does. I'm surprised we've not seen this > > > before though - i'd think it should be a guaranteed deadlock anytime > > > someone tests this scenario. > > > > > > > > > > > > > > > > > IOW, I don't think we need a thread. We should just stop calling > > > > > qemuMigrationSrcCleanup from the connection close callback entirely > > > > > > > > But I agree that something fishy is going on and this doesn't look like > > > > proper solution. Yi, can you please share the backtrace of other threads > > > > too? Why aren't we getting any reply on the monitor? > > > > > > qemuMonitorSend() just puts date onto the TX queue & waits for the > > > main event loop to send it. We're invoking qemuMonitorSend from > > > the main event loop in this backtrace, hence it'll wait forever > > > for the thread to send it. > > > > > > qemuMonitorSend must never be invoked from the main event thread. > > > > So do you think how to imporve this patch? Any sugesstion is appreciated :-) > > I'm facing a related problem in my patches for providing an "embedded" > QEMU driver - any libvirt API calls that the application makes from > its main event loop will deadlock if they result in a QEMU monitor > command. > > We've got another long standing scalability bug too when dealing > with shutdown takes too long, the main event loop is blocked for an > unreasonably long time. > > I think the solution to all of these problems is to stop using the main > event loop for the QEMU monitor and agent. > > Instead, we should create a new thread for each QEMU VM, and that thread > should run an event loop, handling just the monitor and agent work for > that one VM. > > We can do this quite easily now if we use GLib's GMainContext as the > event loop. > > We'll create a GMainContext and store it in qemuDomainObjPrivatePtr. > In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a > thread running this GMainContext as an event loop. > > At the end of qemuProcess we'll need to tell this event loop to quit > and stop the thread. > > Then the qemu_agent.c and qemu_monitor.c code need changing to use the > g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of > virEventAddHandle/virEventRemoveHandle. Is this approach something you would like to work on implementing, or have already started ? If not, I'll attempt to implement it myself in the next week or so. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|