On Mon, Sep 3, 2018 at 4:28 PM Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote: > > On 18/08/2018 16:20, Willem Jan Withagen wrote: > > On 18/08/2018 14:46, Willem Jan Withagen wrote: > >> Hi, > >> > >> I've have upgraded to FreeBSD ALPHA 12.0, but I don't think the errors > >> them from there. Although they could be in one of the libs that came > >> along with the upgrade. > >> > >> I'm getting these warnings during rbd and ceph (maybe even more) > >> invocations that indicate that indicate a possible problem because: > >> === > >> It could be possible that a dynamically loaded library, use > >> thread_local variable but is dlclose()'d before thread exit. The > >> destructor of this variable will then try to access the address, > >> for calling it but it's unloaded, so it'll crash. We're using > >> __elf_phdr_match_addr() to detect and prevent such cases and so > >> prevent the crash. > >> === > >> this is from : > >> https://github.com/freebsd/freebsd/blob/master/lib/libc/stdlib/cxa_thread_atexit_impl.c > >> > >> > >> Now it could be that dlclose() and thread exit are just closed to one > >> another. But still this is hard core embedded in libc already since > >> 2017, so I'm sort of expecting that a recent change has caused this. > >> > >> And as indicated it is a possible cause for crashed, because > >> thread_exit is going to clean up things that are no longer there. > >> > >> Now the 20 dollar question is: > >> Where was this introduced?? > >> > >> Otherwise I'll have to try and throw my best gdb capabilities at it, > >> and try to invoke an rbd call and see where it activates this warning. > > > > Debugging foo was rather simple to find the dtor with a problem: > > > > __cxa_thread_call_dtors: dtr 0x80c9e1bc0 from unloaded dso, skipping > > cxa_thread_walk (cb=<optimized out>) at > > /usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129 > > 129 free(dtor); > > (gdb) info symbol 0x80c9e1bc0 > > std::__1::random_device::~random_device() in section .text of > > /usr/lib/libc++.so.1 > > > > And this is during process exit: > > #0 cxa_thread_walk (cb=<optimized out>) at > > /usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129 > > #1 __cxa_thread_call_dtors () at > > /usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:144 > > #2 0x000000080cbdfb9a in exit (status=45) at > > /usr/srcs/head/src/lib/libc/stdlib/exit.c:73 > > #3 0x000000000060a09c in _start (ap=<optimized out>, cleanup=<optimized > > out>) at /usr/srcs/head/src/lib/csu/amd64/crt1.c:74 > > > > So I guess that it could be about any where where random() is used? > > > > BTW: I have the same issue on jenkins build for mimic > > Again more about this issue, and it seems there is a substantial > difference between Linux and FreeBSD in managing opened dynamic libraries: > > On 26/08/2018 12:19, David Chisnall wrote: > The FreeBSD implementation here looks racy. If one thread dlcloses an > object while another thread is exiting, we can end up calling a function > at an invalid memory address. It also looks as if it may be possible to > unload one library, load another at the same address, and end up > executing entirely the wrong code, which would have some serious > security implications. > > The GNU/Linux equivalent of this function locks the DSO in memory until > all references to it have gone away. A call to dlclose() on GNU/Linux > will not actually unload the library until all threads with destructors > in that library have been unloaded. I believe that this reuses the same > reference counting mechanism that allows the same library to be dlopened > and dlclosed multiple times. > > It would be nice if the FreeBSD version had the same behaviour, because > this is almost certainly expected in code written on other platforms. agreed. > =========== > > So would this be a correct assumption, and that what I see is because > the ceph project actually uses this feature of the Linux DL-implementation? please refer to http://sources.freebsd.org/HEAD/src/lib/libc/stdlib/cxa_thread_atexit_impl.c . i think FreeBSD's libc is trying to avoid a crash when calling dtor of "thread_local random_device" when a thread is exiting, but the instance of "random_device" is living in a shared library (libceph-comon.so i guess), so when after libceph-common is dlclose()'ed, the dtor of the random_device's instance is called. and afterwards, the thread exists and libc tries to call the dtors registered with __cxa_thread_atexit(), and finds that some of the dtor(s) have been called. hence it complains. i don't think it should print this error message at all. as this use case is expected. so, i don't think we are relying a platform-dependent "feature". what we rely on is a behavior that makes sense. > > --WjW > -- Regards Kefu Chai