On Thu, Jul 18, 2024 at 03:50:27PM +0200, Tim Wiederhake wrote: > On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote: > > On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote: > > > `pthread_mutex_destroy`, `pthread_mutex_lock` and > > > `pthread_mutex_unlock` > > > return an error code that is currently ignored. > > > > > > Add debug information if one of these operations failed, e.g. when > > > there > > > is an attempt to destroy a still locked mutex or unlock an already > > > unlocked mutex. Both scenarios are considered undefined behavior. > > > > > > Signed-off-by: Tim Wiederhake <twiederh@xxxxxxxxxx> > > > --- > > > src/util/virthread.c | 15 ++++++++++++--- > > > 1 file changed, 12 insertions(+), 3 deletions(-) > > > > > > diff --git a/src/util/virthread.c b/src/util/virthread.c > > > index 5422bb74fd..14116a2221 100644 > > > --- a/src/util/virthread.c > > > +++ b/src/util/virthread.c > > > @@ -35,7 +35,10 @@ > > > > > > #include "viralloc.h" > > > #include "virthreadjob.h" > > > +#include "virlog.h" > > > > > > +#define VIR_FROM_THIS VIR_FROM_THREAD > > > +VIR_LOG_INIT("util.thread"); > > > > > > int virOnce(virOnceControl *once, virOnceFunc init) > > > { > > > @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m) > > > > > > void virMutexDestroy(virMutex *m) > > > { > > > - pthread_mutex_destroy(&m->lock); > > > + if (pthread_mutex_destroy(&m->lock)) { > > > + VIR_WARN("Failed to destroy mutex=%p", m); > > > + } > > > } > > > > > > void virMutexLock(virMutex *m) > > > { > > > - pthread_mutex_lock(&m->lock); > > > + if (pthread_mutex_lock(&m->lock)) { > > > + VIR_WARN("Failed to lock mutex=%p", m); > > > + } > > > } > > > > > > void virMutexUnlock(virMutex *m) > > > { > > > - pthread_mutex_unlock(&m->lock); > > > + if (pthread_mutex_unlock(&m->lock)) { > > > + VIR_WARN("Failed to unlock mutex=%p", m); > > > + } > > > } > > > > I'd be surprised if these lock/unlock warnings ever trigger, since > > IIUC > > they would need us to be using an error checking mutex, not a regular > > mutex. IOW, aren't these just adding condition test overhead + > > unreachable > > code to the lock calls ? > > > > The 2nd patch shows failures in the destroy calls IIUC. > > I have looked more closely into the issue now. pthread_mutex_lock and > pthread_mutex_unlock do indeed not return a non-zero value over us not > using error checking mutexes. > > During my last attempt at fixing the issues I had a patch that would > count lockings and unlockings of mutexes explicitly, and I believe I > recall seeing problems in that area as well. Sadly, I cannot reproduce > that now, at least not reliably: Ignoring the warnings for > pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number > of locks == number of unlocks" check in about 3 out of 10.000 runs. And > sometimes with a frequency of 1 in 10. Sometimes not at all. In any > case: I do not consider the checks for locking / unlocking dead code. > > So far I have been using the test suite to check for obvious issues, > but I cannot rule out that libvirt itself has race conditions too. But if the checks never fire, due to the mutex type, this isn't helping us diagnose anything surely ? > I would advocate for merging this patch as is, and add a patch to > enable error checking for the mutexes. Error checking mutexes aren't something we can easily enable, because they break any code which needs to hold a mutex across fork() and unlock in the child. IOW, we would need two different types of mutex and pick which to use, where. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|