Linking with -pthread only if linking with a .so that needs it

Sam Varshavchik <mrsam@xxxxxxxxxxxxxxx> · Tue, 24 Apr 2018 07:14:49 -0400

I'd like to to understand a little bit better why it seems that I need to  
use -pthread to compile and link something that does not use execution  
threads, std::locks, or std::mutexes, but it links with several shared  
libraries that do. Without -pthread the link appears to be fine, but then I  
observe some instability in the initial static initialization, before  
main(), in those shared libraries' static initialization code.

The instability occurs in std::mutex::unlock() as part of static  
initialization, before any execution threads start. I debugged the  
instability down to whether, at compile time, a lambda inside main()  
captures a particular object, or not. Even before main() runs, inlined  
std::mutex::unlock() inside the static initialization code of one of the  
shared libraries seems to believe pthreads are not used, so it doesn't  
bother to unlock itself, even though just prior it did, in lock(). And  
without changing the shared library code, only relinking with the -pthread  
flag the code that loads that shared library makes everything work. I'd like  
to undertand the mechanics here:

libcxx.so gets built by gcc.7.3.1 with -std=c++17 -O2 -fvisibility-inlines- 
hidden, -fno-omit-frame-pointer, and -pthread. It contains some static  
variables that are instantiated templates. It contains other static  
variables whose constructors invoke the instantiated templates' methods that  
construct and destruct std::unique_lock<std::mutex> of std::mutexes that are  
static variables of the instantiated templates. Static initialization of  
libcxx.so does not create any execution thread.

libcxxw.so gets built with the same flags and linked with libcxx.so. Nothing  
of interest happens in its static initialization. But its code does create  
execution threads, and uses the same flags to build it.

Application code gets compiled and linked with libcxxw.so and libcxx.so. The  
application code does not use threads, so I build it just with -std=c++17  
and -O2, and nothing else.

At some point while working with the application code everything comes to a  
screeching halt before main() gets invoked. I trace this down to something  
that occurs inside __static_initialization_and_destruction_0() of libcxx.so,  
very early in the game, that gets triggered by a combination of something  
completely unrelated: whether -pthread linked the application code that  
loads this shared library, and whether some boring chunk in the application  
code captured some other boring object in some boring closure.

I'm several layers deep inside static initialization of libcxx.so, and  
stepping through a completely unlined std::unique_lock<std::mutex>::unlock:

(gdb) s
std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0)
   at /usr/include/c++/7/bits/std_mutex.h:319
319		if (!_M_owns)
(gdb) s
321		else if (_M_device)
(gdb) s
323		    _M_device->unlock();
(gdb) s
std::mutex::unlock (
   this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>) at /usr/include/c++/7/bits/std_mutex.h: 
323
323		    _M_device->unlock();
(gdb) s
__gthread_mutex_unlock (
   __mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>)
   at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:777
777	  if (__gthread_active_p ())
(gdb) s
std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0)
   at /usr/include/c++/7/bits/std_mutex.h:324
324		    _M_owns = false;

It appears that this inlined __gthread_active_p() returned false. At this  
point:

(gdb) p *_M_device
$3 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 1, __count = 0,
       __owner = 27113, __nusers = 1, __kind = 0, __spins = 0, __elision =  
0,
       __list = {__prev = 0x0, __next = 0x0}},
     __size = "\001\000\000\000\000\000\000\000\351i\000\000\001", '\000'  
<repeats 26 times>, __align = 1}}, <No data fields>}

This is a locked mutex, that just did not get unlocked. I got completely  
kicked of its destructor, which should've cleared the lock, the owner, and  
everything. But when it was initially locked, from all appearances  
__gthread_active_p() was true. Let me back and show what happened earlier,  
in the constructor:

(gdb) s
std::mutex::lock (
   this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>) at ./../includes/x/singleton.H:114
114			std::unique_lock<std::mutex> lock{
(gdb) s
__gthread_mutex_lock (
   __mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>)
   at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:747
747	  if (__gthread_active_p ())
(gdb) s
std::unique_lock<std::mutex>::unique_lock (__m=…, this=0x7fffffffe7f0)
   at /usr/include/c++/7/bits/std_mutex.h:195
195	      : _M_device(std::__addressof(__m)), _M_owns(false)
(gdb) s
197		lock();

[ several "s"teps inside the inlined call later ]

(gdb) s
__gthread_mutex_lock (
   __mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>)
   at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:747
747	  if (__gthread_active_p ())
(gdb) s
748	    return __gthrw_(pthread_mutex_lock) (__mutex);

__gthread_active_p() appeared to have returned true here, and the code  
proceeds and set the mutex to locked state. And then just a little bit later  
in the destructor, a different (from the looks of it, a different  
inlined)__gthread_active_p() returns false, and I couldn't even step into it.

Now, I stop everything, then relink the application with -pthread. Not even  
compile it, just link it:

g++  -g -O2 -std=c++17 -fno-omit-frame-pointer -I/usr/include/p11-kit-1    - 
o validatedinput validatedinput.o -lcxxw -lcxx

That's it, and I'm not even touching libcxx.so itself, that I was debugging.  
And when I do that, the problem goes away. Note that the code in question  
lives in -lcxx, which remains unchanged. When I debug the same inlined  
destructor, compare this with the first debugging session, above:

(gdb) s
std::unique_lock<std::mutex>::unlock (this=0x7fffffffe7f0)
   at /usr/include/c++/7/bits/std_mutex.h:319
319		if (!_M_owns)
(gdb) s
321		else if (_M_device)
(gdb) s
323		    _M_device->unlock();
(gdb) s
std::mutex::unlock (
   this=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>) at /usr/include/c++/7/bits/std_mutex.h: 
323
323		    _M_device->unlock();
(gdb) s
__gthread_mutex_unlock (
   __mutex=0x7ffff766e2c8 <x::singleton<x::property::globalListObj,  
x::ptrref_base>::static_instance+8>)
   at /usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:777
777	  if (__gthread_active_p ())
(gdb) s
778	    return __gthrw_(pthread_mutex_unlock) (__mutex);

The mutex gets unlocked. Note that this inlined destructor call is inside  
libcxx.so, which is unchanged. The difference is whether the executable gets  
linked with -pthread. Also, bizarelly, whether the executable captured some  
particular closure inside its main(), and that part doesn't even run yet.  
I'd really like to understand why I'm seeing what I'm seeing here, one  
inlined instance of __gthread_active_p() returning true, followed by a  
different inlined instance returning false.

In the shared code that does not change itself, but this only happens if the  
application that itself doesn't used threads, but which links with the  
shared code that does.

Puzzled.

Attachment:
pgpVQIFsAcIyK.pgp

Description: PGP signature