On Fri, Dec 15, 2023 at 02:32:19AM +0800, Fima Shevrin wrote: > RPC client implementation uses the following paradigm. The critical > section is organized via virObjectLock(client)/virObjectUnlock(client) > braces. Though this is potentially problematic as > main thread: side thread: > virObjectUnlock(client); > virObjectLock(client); > g_main_loop_quit(client->eventLoop); > virObjectUnlock(client); > g_main_loop_run(client->eventLoop); > > This means in particular that is the main thread is executing very long > request like VM migration, the wakeup from the side thread could be > stuck until the main request will be fully completed. Can you explain this in more detail, with call traces illustration for the two threads. You're not saying where the main thread is doing work with the 'client' lock hold for a long time. Generally the goal should be for the main thread to only hold the lock for a short time. Also if the side thread is already holding a reference on 'client', then potentially we should consider if it is possible to terminate the event loop without acquiring the mutex, as GMainLoop protects itself wrt concurrent usage already, provided all threads hold a reference directly or indirectly. > > Discrubed case is easily reproducible with the simple python scripts doing slow > and fast requests in parallel from two different threads. > > Our idea is to release the lock at the prepare stage and avoid libvirt stuck > during the interaction between main and side threads. > > Co-authored-by: Denis V. Lunev <den@xxxxxxxxxx> > Co-authored-by: Nikolai Barybin <nikolai.barybin@xxxxxxxxxxxxx> > > Signed-off-by: Fima Shevrin <efim.shevrin@xxxxxxxxxxxxx> > --- > src/rpc/virnetclient.c | 17 ++++++++++++----- > src/util/vireventglibwatch.c | 28 ++++++++++++++++++++++++++-- > 2 files changed, 38 insertions(+), 7 deletions(-) > > diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c > index de8ebc2da9..63bd42ed3a 100644 > --- a/src/rpc/virnetclient.c > +++ b/src/rpc/virnetclient.c > @@ -987,6 +987,9 @@ int virNetClientSetTLSSession(virNetClient *client, > * etc. If we make the grade, it will send us a '\1' byte. > */ > > + /* Here we are not passing the client to virEventGLibAddSocketWatch, > + * since the entire virNetClientSetTLSSession function requires a lock. > + */ > source = virEventGLibAddSocketWatch(virNetSocketGetFD(client->sock), > G_IO_IN, > client->eventCtx, > @@ -1692,14 +1695,18 @@ static int virNetClientIOEventLoop(virNetClient *client, > if (client->nstreams) > ev |= G_IO_IN; > > + /* > + * We don't need to call virObjectLock(client) here, > + * since the .prepare function inside glib Main Loop > + * will do this. virEventGLibAddSocketWatch is responsible > + * for passing client var in glib .prepare > + */ > source = virEventGLibAddSocketWatch(virNetSocketGetFD(client->sock), > ev, > client->eventCtx, > - virNetClientIOEventFD, &data, NULL, NULL); > - > - /* Release lock while poll'ing so other threads > - * can stuff themselves on the queue */ > - virObjectUnlock(client); > + virNetClientIOEventFD, &data, > + (virObjectLockable *)client, > + NULL); > > #ifndef WIN32 > /* Block SIGWINCH from interrupting poll in curses programs, > diff --git a/src/util/vireventglibwatch.c b/src/util/vireventglibwatch.c > index 7680656ba2..641b772995 100644 > --- a/src/util/vireventglibwatch.c > +++ b/src/util/vireventglibwatch.c > @@ -34,11 +34,23 @@ struct virEventGLibFDSource { > > > static gboolean > -virEventGLibFDSourcePrepare(GSource *source G_GNUC_UNUSED, > +virEventGLibFDSourcePrepare(GSource *source, > gint *timeout) > { > + virEventGLibFDSource *ssource = (virEventGLibFDSource *)source; > *timeout = -1; > > + if (ssource->client != NULL) > + virObjectUnlock(ssource->client); > + > + /* > + * Prepare function may be called multiple times > + * in glib Main Loop, thus we assign source->client > + * a null pointer to avoid calling pthread_mutex_unlock > + * on an already unlocked mutex. > + * */ > + ssource->client = NULL; > + > return FALSE; > } > > @@ -123,11 +135,23 @@ struct virEventGLibSocketSource { > > > static gboolean > -virEventGLibSocketSourcePrepare(GSource *source G_GNUC_UNUSED, > +virEventGLibSocketSourcePrepare(GSource *source, > gint *timeout) > { > + virEventGLibSocketSource *ssource = (virEventGLibSocketSource *)source; > *timeout = -1; > > + if (ssource->client != NULL) > + virObjectUnlock(ssource->client); > + > + /* > + * Prepare function may be called multiple times > + * in glib Main Loop, thus we assign source->client > + * a null pointer to avoid calling pthread_mutex_unlock > + * on an already unlocked mutex. > + * */ > + ssource->client = NULL; > + > return FALSE; > } > > -- > 2.34.1 > _______________________________________________ > Devel mailing list -- devel@xxxxxxxxxxxxxxxxx > To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ Devel mailing list -- devel@xxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx