The instability occurs in std::mutex::unlock() as part of static initialization, before any execution threads start. I debugged the instability down to whether, at compile time, a lambda inside main() captures a particular object, or not. Even before main() runs, inlined std::mutex::unlock() inside the static initialization code of one of the shared libraries seems to believe pthreads are not used, so it doesn't bother to unlock itself, even though just prior it did, in lock(). And without changing the shared library code, only relinking with the -pthread flag the code that loads that shared library makes everything work. I'd like to undertand the mechanics here:
libcxx.so gets built by gcc.7.3.1 with -std=c++17 -O2 -fvisibility-inlines- hidden, -fno-omit-frame-pointer, and -pthread. It contains some static variables that are instantiated templates. It contains other static variables whose constructors invoke the instantiated templates' methods that construct and destruct std::unique_lock<std::mutex> of std::mutexes that are static variables of the instantiated templates. Static initialization of libcxx.so does not create any execution thread.
libcxxw.so gets built with the same flags and linked with libcxx.so. Nothing of interest happens in its static initialization. But its code does create execution threads, and uses the same flags to build it.
Application code gets compiled and linked with libcxxw.so and libcxx.so. The application code does not use threads, so I build it just with -std=c++17 and -O2, and nothing else.
At some point while working with the application code everything comes to a screeching halt before main() gets invoked. I trace this down to something that occurs inside __static_initialization_and_destruction_0() of libcxx.so, very early in the game, that gets triggered by a combination of something completely unrelated: whether -pthread linked the application code that loads this shared library, and whether some boring chunk in the application code captured some other boring object in some boring closure.
I'm several layers deep inside static initialization of libcxx.so, and stepping through a completely unlined std::unique_lock<std::mutex>::unlock:
(gdb) s std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0) at /usr/include/c++/7/bits/std_mutex.h:319 319 if (!_M_owns) (gdb) s 321 else if (_M_device) (gdb) s 323 _M_device->unlock(); (gdb) s std::mutex::unlock (this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>) at /usr/include/c++/7/bits/std_mutex.h: 323
323 _M_device->unlock(); (gdb) s __gthread_mutex_unlock (__mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>)
at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:777 777 if (__gthread_active_p ()) (gdb) s std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0) at /usr/include/c++/7/bits/std_mutex.h:324 324 _M_owns = false;It appears that this inlined __gthread_active_p() returned false. At this point:
(gdb) p *_M_device $3 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 1, __count = 0,__owner = 27113, __nusers = 1, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}},__size = "\001\000\000\000\000\000\000\000\351i\000\000\001", '\000' <repeats 26 times>, __align = 1}}, <No data fields>}
This is a locked mutex, that just did not get unlocked. I got completely kicked of its destructor, which should've cleared the lock, the owner, and everything. But when it was initially locked, from all appearances __gthread_active_p() was true. Let me back and show what happened earlier, in the constructor:
(gdb) s std::mutex::lock (this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>) at ./../includes/x/singleton.H:114
114 std::unique_lock<std::mutex> lock{ (gdb) s __gthread_mutex_lock (__mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>)
at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:747 747 if (__gthread_active_p ()) (gdb) s std::unique_lock<std::mutex>::unique_lock (__m=…, this=0x7fffffffe7f0) at /usr/include/c++/7/bits/std_mutex.h:195 195 : _M_device(std::__addressof(__m)), _M_owns(false) (gdb) s 197 lock(); [ several "s"teps inside the inlined call later ] (gdb) s __gthread_mutex_lock (__mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>)
at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:747 747 if (__gthread_active_p ()) (gdb) s 748 return __gthrw_(pthread_mutex_lock) (__mutex);__gthread_active_p() appeared to have returned true here, and the code proceeds and set the mutex to locked state. And then just a little bit later in the destructor, a different (from the looks of it, a different inlined)__gthread_active_p() returns false, and I couldn't even step into it.
Now, I stop everything, then relink the application with -pthread. Not even compile it, just link it:
g++ -g -O2 -std=c++17 -fno-omit-frame-pointer -I/usr/include/p11-kit-1 - o validatedinput validatedinput.o -lcxxw -lcxx
That's it, and I'm not even touching libcxx.so itself, that I was debugging. And when I do that, the problem goes away. Note that the code in question lives in -lcxx, which remains unchanged. When I debug the same inlined destructor, compare this with the first debugging session, above:
(gdb) s std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0) at /usr/include/c++/7/bits/std_mutex.h:319 319 if (!_M_owns) (gdb) s 321 else if (_M_device) (gdb) s 323 _M_device->unlock(); (gdb) s std::mutex::unlock (this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>) at /usr/include/c++/7/bits/std_mutex.h: 323
323 _M_device->unlock(); (gdb) s __gthread_mutex_unlock (__mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj, x::ptrref_base>::static_instance+8>)
at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:777 777 if (__gthread_active_p ()) (gdb) s 778 return __gthrw_(pthread_mutex_unlock) (__mutex);The mutex gets unlocked. Note that this inlined destructor call is inside libcxx.so, which is unchanged. The difference is whether the executable gets linked with -pthread. Also, bizarelly, whether the executable captured some particular closure inside its main(), and that part doesn't even run yet. I'd really like to understand why I'm seeing what I'm seeing here, one inlined instance of __gthread_active_p() returning true, followed by a different inlined instance returning false.
In the shared code that does not change itself, but this only happens if the application that itself doesn't used threads, but which links with the shared code that does.
Puzzled.
Attachment:
pgpVQIFsAcIyK.pgp
Description: PGP signature