On Fri, Jul 17, 2015 at 3:21 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote: > On Thu, Jul 16, 2015 at 12:27:10PM -0700, Andy Lutomirski wrote: >> On Thu, Jul 16, 2015 at 11:08 AM, Mathieu Desnoyers >> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >> > ----- On Jul 14, 2015, at 5:34 AM, Ben Maurer bmaurer@xxxxxx wrote: >> >> >> >> That said, having the ability for the kernel to understand that TLS >> >> implementation are laid out using the same offset on each thread seems like >> >> something that could be valuable long term. Doing so makes it possible to build >> >> other TLS-based features without forcing each thread to be registered. >> > >> > AFAIU, using a fixed hardcoded ABI between kernel and user-space might make >> > transition from the pre-existing ABI (where this memory area is not >> > reserved) a bit tricky without registering the area, or getting a "feature" >> > flag, through a system call. >> > >> > The related question then becomes: should we issue this system call once >> > per process, or once per thread at thread creation ? Issuing it once per >> > thread is marginally more costly for thread creation, but seems to be >> > easier to deal with internally within the kernel. >> > >> > We could however ensure that only a single system call is needed per new-coming >> > thread, rather than one system call per feature. One way to do this would be >> > to register an area that may contain more than just the CPU id. It could >> > consist of an expandable structure with fixed offsets. When registered, we >> > could pass the size of that structure as an argument to the system call, so >> > the kernel knows which features are expected by user-space. >> >> If we actually bit the bullet and implemented per-cpu mappings, we >> could have this be completely flexible because there would be no >> format at all. Similarly, if we implemented per-cpu segments, >> userspace would need to agree with *itself* how to arbitrate it, but >> the kernel wouldn't need to be involved. >> >> With this kind of memory poking, it's definitely messier, which is unfortunate. >> > Could you recapitulate thread? On libc side we didn't read most of it so > it would be appreciated. > > If per-cpu mappings mean that there is a single virtual page that is > mapped to different virtual pages? Single virtual page that's mapped to different physical pages on different cpus. I believe that ARM has some hardware support for this, but I'm not that familiar with ARM. x86 can fake it (at the cost of some context switch overhead). > > I had in my todo list improving tls access. This would help tls > implementations for older arms and in general architectures that dont > store tcb in register. > > My proposal is modulo small constant equivalent of userspace accessing tid > without syscall overhead, just use array of tcb's for first 32768 tids > and do syscall only when tid exceeds that. > > On userspace my proposal would be use map that to fixed virtual address and store tcb in first eigth bytes. Kernel would on context switch along registers also > save and restore these. That would make tls access cheap as it would > need only extra load instruction versus static variable. > The problem is that having the kernel access userspace memory on context switch, while doable, is a little bit unpleasant. We also really need to get the ABI right the first time, because we don't really get a second chance. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html