The 10/14/2020 16:43, Peter Collingbourne wrote: > On Fri, Sep 18, 2020 at 1:30 AM Will Deacon <will@xxxxxxxxxx> wrote: > > I think so, yes. I'm hoping to queue it for 5.10, once I have an Ack from > > the Android tools side on the per-thread ABI. > > Our main requirement on the Android side is to provide an API for > changing the tag checking mode in all threads in a process while > multiple threads are running. I think we've been able to accomplish > this [1] by using a libc private real-time signal which is sent to all > threads. The implementation has been tested on FVP via the included > unit tests. The code has also been tested on real hardware in a > multi-threaded app process (of course we don't have MTE-enabled > hardware, so the implementation was tested on hardware by hacking it > to disable the tagged address ABI instead of changing the tag checking > mode, and then verifying via ptrace(PTRACE_GETREGSET) that the tagged > address ABI was disabled in all threads). > > That being said, as with any code at the nexus of concurrency and > POSIX signals, the implementation is quite tricky so I would say it > falls more into the category of "no obvious problems" than "obviously > no problems". It also relies on changes to the implementations of > pthread APIs so it wouldn't catch threads created directly via clone() > rather than via pthread_create(). I think we would be able to ignore > such threads on Android without causing compatibility issues because > we can require the process to not create threads via clone() before > calling the function. I imagine this may not necessarily work for > other libcs like glibc, though, but as I understand it glibc has no > plan to offer such an API. no immediate plans. to make such api useful we would have to expose it to users (e.g. custom allocators) which is tricky. note that glibc has the necessary infrastructure to do the internal signaling, but it had issues in the past. i think it had problems with qemu-user and golang c ffi and libc internal issues around multi-threaded fork/vfork or simply stack overflow because of small thread stacks and growing signal frames that are more likely to hit at the wrong time if libc uses more internal signals. so i think such per process operation is easier to handle correctly in the kernel. doing this outside of the libc (e.g. in a custom allocator) is not possible (without relying on new libc apis) which i thought was a reasonable use-case, but likely glibc will enable sync tag checks early and leave it that way (the only tricky bit is to have an opt-in/-out mechanism for binaries that are not compatible with the tagged address abi and i don't know yet how that will work). > [1] https://android-review.googlesource.com/c/platform/bionic/+/1427377 btw in the bionic implementation there are writes to globals (g_tcf, g_arg, g_func) that are later read in signal handlers of other threads without atomics. i'm not sure if that's enough synchronization (can we assume that tgkill synchronizes with signal handlers?).