On Tue, Jul 14, 2020 at 2:33 PM Peter Oskolkov <posk@xxxxxxxxxx> wrote: > > On Tue, Jul 14, 2020 at 10:43 AM Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > > > ----- On Jul 14, 2020, at 1:24 PM, Peter Oskolkov posk@xxxxxxx wrote: > > > > > At Google, we actually extended struct rseq (I will post the patches > > > here once they are fully deployed and we have specific > > > benefits/improvements to report). We did this by adding several fields > > > below __u32 flags (the last field currently), and correspondingly > > > increasing rseq_len in rseq() syscall. If the kernel does not know of > > > this extension, it will return -EINVAL due to an unexpected rseq_len; > > > then the application can either fall-back to the standard/upstream > > > rseq, or bail. If the kernel does know of this extension, it accepts > > > it. If the application passes the old rseq_len (32), the kernel knows > > > that this is an old application and treats it as such. > > > > > > I looked through the archives, but I did not find specifically why the > > > pretty standard approach described above is considered inferior to the > > > one taken in this patch (freeze rseq_len at 32, add additional length > > > fields to struct rseq). Can these be summarized? > > > > I think you don't face the issues I'm facing with libc rseq integration > > because you control the entire user-space software ecosystem at Google. > > > > The main issue we face is that the library responsible for registering > > rseq (either glibc 2.32+, an early-adopter librseq library, or the > > application) may very well not be the same library defining the __rseq_abi > > symbol used in the global symbol table. Interposition with ld preload or > > by defining the __rseq_abi in the program's executable are good examples > > of this kind of scenario, and those use-cases are supported. Does this work if/when we run out of bytes in the current sizeof(__rseq_abi)? Which library provides the TLS symbol (and N bytes of storage) seems sensitive to the choices the linker makes for us, once the symbol sizes diverge. > > So the size of the __rseq_abi structure may be larger than the struct > > rseq known by glibc (and eventually smaller, if future glibc versions > > extend their __rseq_abi size but is loaded with an older program/library > > doing __rseq_abi interposition). When glibc provides registration, is the anticipated use case that a library would unregister and reregister each thread to "upgrade" it to the most modern version of interface it knows about provided by the kernel? > > So we need some way to allow code defining the __rseq_abi to let the kernel > > know how much room is available, without necessarily requiring the code > > responsible for rseq registration to be aware of that extended layout. > > This is the purpose of the __rseq_abi.flags RSEQ_FLAG_TLS_SIZE and field > > __rseq_abi.user_size. > > > > And we need some way to allow the kernel to let user-space rseq critical > > sections (user code) know how much of those fields are actually populated > > by the kernel. This is the purpose of __rseq_abi.flags RSEQ_FLAG_TLS_SIZE > > with __rseq_abi.kernel_size. I authored the userspace component (https://github.com/google/tcmalloc/commit/ad136d45f75a273b934446699cef8b278c34ec6e) that consumes the extensions Peter mentions and found that minimizing the performance impact of their potential absence was a bit of a challenge. There, I could assume an all-or-nothing registration of the new feature--limited only by kernel availability for thread homogeneity--but inconsistencies across early adopter libraries would mean each thread would have to examine its own TLS to determine if a feature were available. Chris