Re: Warnings about dlclose before thread exit

Willem Jan Withagen <wjw@xxxxxxxxxxx> · Mon, 3 Sep 2018 10:27:50 +0200

On 18/08/2018 16:20, Willem Jan Withagen wrote:
On 18/08/2018 14:46, Willem Jan Withagen wrote:
Hi,

I've have upgraded to FreeBSD ALPHA 12.0, but I don't think the errors 
them from there. Although they could be in one of the libs that came 
along with the upgrade.

I'm getting these warnings during rbd and ceph (maybe even more) 
invocations that indicate that indicate a possible problem because:
===
    It could be possible that a dynamically loaded library, use
    thread_local variable but is dlclose()'d before thread exit.  The
    destructor of this variable will then try to access the address,
    for calling it but it's unloaded, so it'll crash.  We're using
    __elf_phdr_match_addr() to detect and prevent such cases and so
    prevent the crash.
===
this is from : 
https://github.com/freebsd/freebsd/blob/master/lib/libc/stdlib/cxa_thread_atexit_impl.c 

Now it could be that dlclose() and thread exit are just closed to one 
another. But still this is hard core embedded in libc already since 
2017, so I'm sort of expecting that a recent change has caused this.

And as indicated it is a possible cause for crashed, because 
thread_exit is going to clean up things that are no longer there.

Now the 20 dollar question is:
     Where was this introduced??

Otherwise I'll have to try and throw my best gdb capabilities at it, 
and try to invoke an rbd call and see where it activates this warning.

Debugging foo was rather simple to find the dtor with a problem:

__cxa_thread_call_dtors: dtr 0x80c9e1bc0 from unloaded dso, skipping
cxa_thread_walk (cb=<optimized out>) at 
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129
129                     free(dtor);
(gdb) info symbol 0x80c9e1bc0
std::__1::random_device::~random_device() in section .text of 
/usr/lib/libc++.so.1

And this is during process exit:
#0  cxa_thread_walk (cb=<optimized out>) at 
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129
#1  __cxa_thread_call_dtors () at 
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:144
#2  0x000000080cbdfb9a in exit (status=45) at 
/usr/srcs/head/src/lib/libc/stdlib/exit.c:73
#3  0x000000000060a09c in _start (ap=<optimized out>, cleanup=<optimized 
out>) at /usr/srcs/head/src/lib/csu/amd64/crt1.c:74

So I guess that it could be about any where where random() is used?

BTW: I have the same issue on jenkins build for mimic

Again more about this issue, and it seems there is a substantial 
difference between Linux and FreeBSD in managing opened dynamic libraries:

On 26/08/2018 12:19, David Chisnall wrote:
The FreeBSD implementation here looks racy.  If one thread dlcloses an 
object while another thread is exiting, we can end up calling a function 
at an invalid memory address.  It also looks as if it may be possible to 
unload one library, load another at the same address, and end up 
executing entirely the wrong code, which would have some serious 
security implications.

The GNU/Linux equivalent of this function locks the DSO in memory until 
all references to it have gone away.  A call to dlclose() on GNU/Linux 
will not actually unload the library until all threads with destructors 
in that library have been unloaded.  I believe that this reuses the same 
reference counting mechanism that allows the same library to be dlopened 
and dlclosed multiple times.

It would be nice if the FreeBSD version had the same behaviour, because 
this is almost certainly expected in code written on other platforms.
===========

So would this be a correct assumption, and that what I see is because 
the ceph project actually uses this feature of the Linux DL-implementation?

--WjW