Re: Bringing rseq back into glibc

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Thu, 18 Nov 2021 11:32:59 -0500 (EST)

----- On Nov 18, 2021, at 5:17 AM, Florian Weimer fweimer@xxxxxxxxxx wrote:

> I would like to bring back rseq for glibc 2.35.

That's excellent news ! Thanks for looking into this.

> I propose the following steps:
> 
> 1. Enable rseq registration in glibc, for internal use only.  This time,
>   put the rseq area into struct pthread, not into a initial-exec TLS
>   symbol.  (This helps to avoid with initial-exec TLS bloat with dlopen
>   and simplifies initialization somewhat.)

That works for me.

> 
> 2. Add a tunable to disable rseq registration in glibc.  This way, if
>   there is already an rseq user, it can be made to work again by
>   setting the tunable.

Out of curiosity, how is the glibc tunable exposed ? Can it be called
from the application, or is it an environment variable which needs to
be set before running the application ?

> 
> 3. Implement sched_getcpu on top of rseq.
> 
> 4. Add public symbols __rseq_abi_offset, __rseq_abi_size (currently 32
>   or 0), __rseq_abi_flags (currently 0).  __rseq_abi_offset is the
>   offset to add to the thread pointer (see __builtin_thread_pointer) to
>   get to the rseq area.  They will be public ABI symbols.  These
>   variables are initialized before user code runs, and changing the
>   results in undefined behavior.

Works for me. So if the Linux kernel eventually implements something along
the lines of an extensible kTLS, we can could use that underneath.

Small bike-shedding comment: I wonder if we want those public glibc
symbols to be called "__rseq_abi_{offset,size,flags}", or if a name like
"__ktls_{offset,size,flags}" might be more appropriate and future-proof
from a glibc ABI standpoint ?

> 
> Under this model, the rseq area offset is clearly constant across all
> threads.  (This was previously implied by using initial-exec TLS
> memory.)  rseq registration failure is indicated by __rseq_abi_size ==
> 0.  If the size is non-zero, rseq will be registered on all threads
> created by glibc, and all the time as far as user code is concernes.
> (This assumes that if rseq registration succeeds on the main thread, it
> will succeed on all other threads.  We will terminate the process if
> not.)  For example, if a JIT compiler sees __rseq_abi_size >= 32, in
> generated code, it can inline a version of sched_getcpu that
> materializes the thread pointer and loads the cpu_id field from the rseq
> area, without further checks.  Under the old TLS-based model, it was
> less clear that this was a valid optimization.

Sounds good.

Note that multiple applications wishing to use rseq on a shared memory
area may find themselves in a situation where some applications support
rseq, and others don't. So it would be up to the application to negotiate
whether they can use rseq in a shared memory area or not.

> 
> Furthermore, I believe this approach will be more compatible with
> potential future kernel changes in this area.  If the kernel tells us
> some day through the auxiliary vector that we should register a 128-byte
> rseq area with 64-byte alignment, we can make that happen and change
> __rseq_abi_offset and __rseq_abi_size accordingly.

Yes, hence my question about __ktls_* naming for the glibc symbols.

> 
> Steps 1 to 3 are backportable to previous glibc version, especially to
> 2.34 with its integrated libpthread.

So if we have an application or library already using rseq directly through
the system call, upgrading glibc may cause it to fail. Arguably, no new
symbol are exposed, so I guess it's OK with the backport guide-lines.
My question here is: is it OK for a backported patch to break an
application which uses the Linux kernel system calls directly ?

> 
> Comments?  As I said, I'd like to bring these changes into glibc 2.35,
> hopefully in early December.

I won't have time to do the implementation effort myself this time due to
other commitments, but I will try to free up some time for review. Feel
free to grab whatever code you feel is useful from my earlier rseq
integration patches (if any).

Thanks,

Mathieu

> 
> Thanks,
> Florian

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com