Re: Warnings about dlclose before thread exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4-9-2018 18:44, kefu chai wrote:
On Mon, Sep 3, 2018 at 4:28 PM Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:

On 18/08/2018 16:20, Willem Jan Withagen wrote:
On 18/08/2018 14:46, Willem Jan Withagen wrote:
Hi,

I've have upgraded to FreeBSD ALPHA 12.0, but I don't think the errors
them from there. Although they could be in one of the libs that came
along with the upgrade.

I'm getting these warnings during rbd and ceph (maybe even more)
invocations that indicate that indicate a possible problem because:
===
     It could be possible that a dynamically loaded library, use
     thread_local variable but is dlclose()'d before thread exit.  The
     destructor of this variable will then try to access the address,
     for calling it but it's unloaded, so it'll crash.  We're using
     __elf_phdr_match_addr() to detect and prevent such cases and so
     prevent the crash.
===
this is from :
https://github.com/freebsd/freebsd/blob/master/lib/libc/stdlib/cxa_thread_atexit_impl.c


Now it could be that dlclose() and thread exit are just closed to one
another. But still this is hard core embedded in libc already since
2017, so I'm sort of expecting that a recent change has caused this.

And as indicated it is a possible cause for crashed, because
thread_exit is going to clean up things that are no longer there.

Now the 20 dollar question is:
      Where was this introduced??

Otherwise I'll have to try and throw my best gdb capabilities at it,
and try to invoke an rbd call and see where it activates this warning.

Debugging foo was rather simple to find the dtor with a problem:

__cxa_thread_call_dtors: dtr 0x80c9e1bc0 from unloaded dso, skipping
cxa_thread_walk (cb=<optimized out>) at
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129
129                     free(dtor);
(gdb) info symbol 0x80c9e1bc0
std::__1::random_device::~random_device() in section .text of
/usr/lib/libc++.so.1

And this is during process exit:
#0  cxa_thread_walk (cb=<optimized out>) at
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:129
#1  __cxa_thread_call_dtors () at
/usr/srcs/head/src/lib/libc/stdlib/cxa_thread_atexit_impl.c:144
#2  0x000000080cbdfb9a in exit (status=45) at
/usr/srcs/head/src/lib/libc/stdlib/exit.c:73
#3  0x000000000060a09c in _start (ap=<optimized out>, cleanup=<optimized
out>) at /usr/srcs/head/src/lib/csu/amd64/crt1.c:74

So I guess that it could be about any where where random() is used?

BTW: I have the same issue on jenkins build for mimic

Again more about this issue, and it seems there is a substantial
difference between Linux and FreeBSD in managing opened dynamic libraries:

On 26/08/2018 12:19, David Chisnall wrote:
The FreeBSD implementation here looks racy.  If one thread dlcloses an
object while another thread is exiting, we can end up calling a function
at an invalid memory address.  It also looks as if it may be possible to
unload one library, load another at the same address, and end up
executing entirely the wrong code, which would have some serious
security implications.

The GNU/Linux equivalent of this function locks the DSO in memory until
all references to it have gone away.  A call to dlclose() on GNU/Linux
will not actually unload the library until all threads with destructors
in that library have been unloaded.  I believe that this reuses the same
reference counting mechanism that allows the same library to be dlopened
and dlclosed multiple times.

It would be nice if the FreeBSD version had the same behaviour, because
this is almost certainly expected in code written on other platforms.

agreed.

===========

So would this be a correct assumption, and that what I see is because
the ceph project actually uses this feature of the Linux DL-implementation?

please refer to
http://sources.freebsd.org/HEAD/src/lib/libc/stdlib/cxa_thread_atexit_impl.c
.

Hi Kefu,

Yup, I got the text above from that file.

i think FreeBSD's libc is trying to avoid a crash when calling dtor of
"thread_local random_device" when a thread is exiting, but the
instance of "random_device" is living in a shared library
(libceph-comon.so i guess).

This is what I concluded as well. Avoiding crashes is always nice when allowed/useful.

so when after libceph-common is
dlclose()'ed, the dtor of the random_device's instance is called. and
afterwards, the thread exists and libc tries to call the dtors
registered with __cxa_thread_atexit(), and finds that some of the
dtor(s) have been called. hence it complains. i don't think it should
print this error message at all. as this use case is expected.

'mmmm, not quite I would think.
The dtors are registered to be called atexit, so libc is only doing what it was asked to do. And if for one reason it cannot, it complains about it. As I think it should, the alternative is to try and fulfill the contract: try execute, and crash.

I think the order should be:
	call dtor
	dlclose()
Which I know is rather easy to say, but not always easy to guarantee.
Or if calling the dtor is not that important: don't register it for destruction atexit.

I have silenced the warning for myself, since it drives me crazy, that much is it called. But it is not an option for the regular ceph user to start recompiling system libraries. ;-)

so, i don't think we are relying a platform-dependent "feature". what
we rely on is a behavior that makes sense.

"Makes sense" as in: we are required to call a function, but if that code space no longer exists, don't worry about it? It is not a problem as long as the dtor is a rather simple thingy, but if it is the location where the prizes are collected that that would not be so good?

So does Linux have a counterpart of this behaviour, that keeps DLs open untill the last reference count? Only then to really dlclose() the region with the code?

Funny part is that I only started having these warning recently. So before that time, the order of calling dtors and dlclose() was not causing any of the messages.

So my famous words "somewhere something has changed" are applicable.

--WjW



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux