Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

Andy Lutomirski <luto@xxxxxxxxxx> · Tue, 29 Sep 2020 11:01:29 -0700

On Mon, Sep 28, 2020 at 8:14 AM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>
> * Mathieu Desnoyers:
>
> > Upstreaming efforts aiming to integrate rseq support into glibc led to
> > interesting discussions, where we identified a clear need to extend the
> > size of the per-thread structure shared between kernel and user-space
> > (struct rseq).  This is something that is not possible with the current
> > rseq ABI.  The fact that the current non-extensible rseq kernel ABI
> > would also prevent glibc's ABI to be extended prevents its integration
> > into glibc.
> >
> > Discussions with glibc maintainers led to the following design, which we
> > are calling "Kernel Thread Local Storage" or KTLS:
> >
> > - at glibc library init:
> >   - glibc queries the size and alignment of the KTLS area supported by the
> >     kernel,
> >   - glibc reserves the memory area required by the kernel for main
> >     thread,
> >   - glibc registers the offset from thread pointer where the KTLS area
> >     will be placed for all threads belonging to the threads group which
> >     are created with clone3 CLONE_RSEQ_KTLS,
> > - at nptl thread creation:
> >   - glibc reserves the memory area required by the kernel,
> > - application/libraries can query glibc for the offset/size of the
> >   KTLS area, and offset from the thread pointer to access that area.
>
> One remaining challenge see is that we want to use vDSO functions to
> abstract away the exact layout of the KTLS area.  For example, there are
> various implementation strategies for getuid optimizations, some of them
> exposing a shared struct cred in a thread group, and others not doing
> that.
>
> The vDSO has access to the thread pointer because it's ABI (something
> that we recently (and quite conveniently) clarified for x86).  What it
> does not know is the offset of the KTLS area from the thread pointer.
> In the original rseq implementation, this offset could vary from thread
> to thread in a process, although the submitted glibc implementation did
> not use this level of flexibility and the offset is constant.  The vDSO
> is not relocated by the run-time dynamic loader, so it can't use ELF TLS
> data.

I assume that, by "thread pointer", you mean the pointer stored in
GSBASE on x86_32, FSBASE on x86_64, and elsewhere on other
architectures?

The vDSO has done pretty well so far having the vDSO not touch FS, GS,
or their bases at all.  If we want to change that, I would be very
nervous about doing so in existing vDSO functions.  Regardless of
anything an ABI document might say and anything that existing or
previous glibc versions may or may not have done, there are plenty of
bizarre programs out there that don't really respect the psABI
document.  Go and various not-ready-for-prime-time-but-released-anyway
Bionic branches come to mind.  So we would need to tread very, very
carefully.

One way to side-step much of this would be to make the interface explicit:

long __vdso_do_whatever(void *ktls_ptr, ...);

Sadly, on x86, actually generating the ktls ptr is bit nasty due to
the fact that lea %fs:(offset) doesn't do what one might have liked it
to do.  I suppose this could also be:

long __vdso_do_whatever(unsigned long ktls_offset);

which will generate quite nice code on x86_64.  I can't speak for the
asm capabilities of other architectures.

What I *don't* want to do is to accidentally repeat anything like the
%gs:0x28 mess we have with the stack cookie on x86_32.  (The stack
cookie is, in kernel code, in a completely nonsensical location.  I'm
quite surprised that any of the maintainers ever accepted the current
stack cookie implementation.  I assume there's some history there, but
I don't know it.  The end result is a festering mess in the x86_32
kernel code that only persists because no one cares quite enough about
x86_32 to fix it.)  We obviously won't end up with precisely the same
type of mistake here, but a mis-step here certainly does have the
possibility of promoting an unfortunate-in-hindsight design decision
in glibc and/or psABI to something that every other x86_64 Linux
software stack has to copy to be compatible with the vDSO.

As for errno itself, with all due respect to those who designed errno
before I was born, IMO it was a mistake.  Why exactly should the vDSO
know about errno?