----- On Jul 7, 2020, at 3:29 AM, Florian Weimer fw@xxxxxxxxxxxxx wrote: > * Mathieu Desnoyers: > >> commit 93b585c08d16 ("Fix: sched: unreliable rseq cpu_id for new tasks") >> addresses an issue with cpu_id field of newly created processes. Expose >> a flag which can be used by user-space to query whether the kernel >> implements this fix. >> >> Considering that this issue can cause corruption of user-space per-cpu >> data updated with rseq, it is recommended that user-space detects >> availability of this fix by using the RSEQ_FLAG_RELIABLE_CPU_ID flag >> either combined with registration or on its own before using rseq. > > Presumably, the intent is that glibc uses RSEQ_FLAG_RELIABLE_CPU_ID to > register the rseq area. That will surely prevent glibc itself from > activating rseq on broken kernels. But if another rseq library > performs registration and has not been updated to use > RSEQ_FLAG_RELIABLE_CPU_ID, we still end up with an active rseq area > (and incorrect CPU IDs from sched_getcpu in glibc). So further glibc > changes will be needed. I suppose we could block third-party rseq > registration with a registration of a hidden rseq area (not > __rseq_abi). But then the question is if any of the third-party rseq > users are expecting the EINVAL error code from their failed > registration. > > The rseq registration state machine is quite tricky already, and the > need to use RSEQ_FLAG_RELIABLE_CPU_ID would make it even more > complicated. Even if we implemented all the changes, it's all going > to be essentially dead, untestable code in a few months, when the > broken kernels are out of circulation. It does not appear to be good > investment to me. Those are very good points. One possibility we have would be to let glibc do the rseq registration without the RSEQ_FLAG_RELIABLE_CPU_ID flag. On kernels with the bug present, the cpu_id field is still good enough for typical uses of sched_getcpu() which does not appear to have a very strict correctness requirement on returning the right cpu number. Then libraries and applications which require a reliable cpu_id field could check this on their own by calling rseq with the RSEQ_FLAG_RELIABLE_CPU_ID flag. This would not make the state more complex in __rseq_abi, and let each rseq user decide about its own fate: whether it uses rseq or keeps using an rseq-free fallback. I am still tempted to allow combining RSEQ_FLAG_REGISTER | RSEQ_FLAG_RELIABLE_CPU_ID for applications which would not be using glibc, and want to check this flag on thread registration. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com