On Mon, Sep 28, 2020 at 8:14 AM Florian Weimer <fweimer@xxxxxxxxxx> wrote: > > * Mathieu Desnoyers: > > > Upstreaming efforts aiming to integrate rseq support into glibc led to > > interesting discussions, where we identified a clear need to extend the > > size of the per-thread structure shared between kernel and user-space > > (struct rseq). This is something that is not possible with the current > > rseq ABI. The fact that the current non-extensible rseq kernel ABI > > would also prevent glibc's ABI to be extended prevents its integration > > into glibc. > > > > Discussions with glibc maintainers led to the following design, which we > > are calling "Kernel Thread Local Storage" or KTLS: > > > > - at glibc library init: > > - glibc queries the size and alignment of the KTLS area supported by the > > kernel, > > - glibc reserves the memory area required by the kernel for main > > thread, > > - glibc registers the offset from thread pointer where the KTLS area > > will be placed for all threads belonging to the threads group which > > are created with clone3 CLONE_RSEQ_KTLS, > > - at nptl thread creation: > > - glibc reserves the memory area required by the kernel, > > - application/libraries can query glibc for the offset/size of the > > KTLS area, and offset from the thread pointer to access that area. > > One remaining challenge see is that we want to use vDSO functions to > abstract away the exact layout of the KTLS area. For example, there are > various implementation strategies for getuid optimizations, some of them > exposing a shared struct cred in a thread group, and others not doing > that. > > The vDSO has access to the thread pointer because it's ABI (something > that we recently (and quite conveniently) clarified for x86). What it > does not know is the offset of the KTLS area from the thread pointer. > In the original rseq implementation, this offset could vary from thread > to thread in a process, although the submitted glibc implementation did > not use this level of flexibility and the offset is constant. The vDSO > is not relocated by the run-time dynamic loader, so it can't use ELF TLS > data. I assume that, by "thread pointer", you mean the pointer stored in GSBASE on x86_32, FSBASE on x86_64, and elsewhere on other architectures? The vDSO has done pretty well so far having the vDSO not touch FS, GS, or their bases at all. If we want to change that, I would be very nervous about doing so in existing vDSO functions. Regardless of anything an ABI document might say and anything that existing or previous glibc versions may or may not have done, there are plenty of bizarre programs out there that don't really respect the psABI document. Go and various not-ready-for-prime-time-but-released-anyway Bionic branches come to mind. So we would need to tread very, very carefully. One way to side-step much of this would be to make the interface explicit: long __vdso_do_whatever(void *ktls_ptr, ...); Sadly, on x86, actually generating the ktls ptr is bit nasty due to the fact that lea %fs:(offset) doesn't do what one might have liked it to do. I suppose this could also be: long __vdso_do_whatever(unsigned long ktls_offset); which will generate quite nice code on x86_64. I can't speak for the asm capabilities of other architectures. What I *don't* want to do is to accidentally repeat anything like the %gs:0x28 mess we have with the stack cookie on x86_32. (The stack cookie is, in kernel code, in a completely nonsensical location. I'm quite surprised that any of the maintainers ever accepted the current stack cookie implementation. I assume there's some history there, but I don't know it. The end result is a festering mess in the x86_32 kernel code that only persists because no one cares quite enough about x86_32 to fix it.) We obviously won't end up with precisely the same type of mistake here, but a mis-step here certainly does have the possibility of promoting an unfortunate-in-hindsight design decision in glibc and/or psABI to something that every other x86_64 Linux software stack has to copy to be compatible with the vDSO. As for errno itself, with all due respect to those who designed errno before I was born, IMO it was a mistake. Why exactly should the vDSO know about errno?