On Thu, Nov 22, 2018 at 05:59:44PM +0100, Florian Weimer wrote: > * Mathieu Desnoyers: > > > ----- On Nov 22, 2018, at 11:28 AM, Florian Weimer fweimer@xxxxxxxxxx wrote: > > > >> * Mathieu Desnoyers: > >> > >>> Here is one scenario: we have 2 early adopter libraries using rseq which > >>> are deployed in an environment with an older glibc (which does not > >>> support rseq). > >>> > >>> Of course, none of those libraries can be dlclose'd unless they somehow > >>> track all registered threads. > >> > >> Well, you can always make them NODELETE so that dlclose is not an issue. > >> If the library is small enough, that shouldn't be a problem. > > > > That's indeed what I do with lttng-ust, mainly due to use of pthread_key. > > > >> > >>> But let's focus on how exactly those libraries can handle lazily > >>> registering rseq. They can use pthread_key, and pthread_setspecific on > >>> first use by the thread to setup a destructor function to be invoked > >>> at thread exit. But each early adopter library is unaware of the > >>> other, so if we just use a "is_initialized" flag, the first destructor > >>> to run will unregister rseq while the second library may still be > >>> using it. > >> > >> I don't think you need unregistering if the memory is initial-exec TLS > >> memory. Initial-exec TLS memory is tied directly to the TCB and cannot > >> be freed while the thread is running, so it should be safe to put the > >> rseq area there even if glibc knows nothing about it. > > > > Is it true for user-supplied stacks as well ? > > I'm not entirely sure because the glibc terminology is confusing, but I > think it places intial-exec TLS into the static TLS area (so that it has > a fixed offset from the TCB). The static TLS area is placed on the > user-supplied stack. This is an implementation detail that should not leak to applications, and I believe it's still considered a bug, in that, with large static TLS, it could overflow or leave unusably little space left on an otherwise-plenty-large application-provided stack. > > One issue here is that early adopter libraries cannot always use > > the IE model. I tried using it for other TLS variables in lttng-ust, and > > it ended up hanging our CI tests when tracing a sample application with > > lttng-ust under a Java virtual machine: being dlopen'd in a process that > > possibly already exhausts the number of available backup TLS IE entries > > seems to have odd effects. This is why I'm worried about using the IE model > > within lttng-ust. > > You can work around this by preloading the library. I'm not sure if > this is a compelling reason not to use initial-exec TLS memory. Use of IE model from a .so file (except possibly libc.so or something else that inherently needs to be present at program startup for other reasons) should be a considered a bug and unsupported usage. Encouraging libraries to perpetuate this behavior is going backwards on progress that's being made to end it. > >>> The same problem arises if we have an application early adopter which > >>> explicitly deal with rseq, with a library early adopter. The issue is > >>> similar, except that the application will explicitly want to unregister > >>> rseq before exiting the thread, which leaves a race window where rseq > >>> is unregistered, but the library may still need to use it. > >>> > >>> The reference counter solves this: only the last rseq user for a thread > >>> performs unregistration. > >> > >> If you do explicit unregistration, you will run into issues related to > >> destructor ordering. You should really find a way to avoid that. > > > > The per-thread reference counter is a way to avoid issues that arise from > > lack of destructor ordering. Is it an acceptable approach for you, or > > you have something else in mind ? > > Only for the involved libraries. It will not help if other TLS > destructors run and use these libraries. Presumably they should have registered their need for rseq too, thereby incrementing the reference count. I'm not sure this is a good idea, but I think I understand it now. Rich