----- On Dec 13, 2021, at 1:47 PM, Florian Weimer fweimer@xxxxxxxxxx wrote: > I've been studying Jann Horn's biased locking example: > > Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space > <https://lore.kernel.org/linux-api/CAG48ez02UDn_yeLuLF4c=kX0=h2Qq8Fdb0cer1yN8atbXSNjkQ@xxxxxxxxxxxxxx/> > > It uses MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ as part of the biased lock > revocation. > > How does the this code know that the process has called > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ? I won't speak for this code snippet in particular, but in general issuing MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ from a thread which belongs to a process which has not performed MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ will result in membarrier returning -EPERM. If the kernel is built without CONFIG_RSEQ support, it will return -EINVAL: membarrier_private_expedited(): } else if (flags == MEMBARRIER_FLAG_RSEQ) { if (!IS_ENABLED(CONFIG_RSEQ)) return -EINVAL; if (!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY)) return -EPERM; If you want to create code which optionally depends on availability of EXPEDITED_RSEQ membarrier, I suspect you will want to perform registration from a library constructor, and keep track of registration success/failure in a static variable within the library. > Could it fall back to > MEMBARRIER_CMD_GLOBAL instead? No. CMD_GLOBAL does not issue the required rseq fence used by the algorithm discussed. Also, CMD_GLOBAL has quite a few other shortcomings: it takes a while to execute, and is incompatible with nohz_full kernels. > Why is it that MEMBARRIER_CMD_GLOBAL > does not require registration (the broader/more expensive barrier), but > the more restricted versions do? The more restricted versions (which require explicit registration) have a close integration with the Linux scheduler, and in some cases require additional code to be executed when scheduling between threads which belong to different processes, for instance the for "SYNC_CORE" membarrier, which is useful for JITs: static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { if (current->mm != mm) return; if (likely(!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) return; sync_core_before_usermode(); } Also, for the "global-expedited" commands, these can generate IPIs which will interrupt the flow of threads running on behalf of a registered process. Therefore, in order to make sure we do not add delays to real-time sensitive applications, we made this registration "opt-in". In order to make sure the programming model is the same for expedited private/global plain/sync-core/rseq membarrier commands, we require that each process perform a registration beforehand. > > Or put differently, why wouldn't we request > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ unconditionally at > process start in glibc, once we start biased locking in a few places? The registration of membarrier expedited can be either performed immediately when the process starts, or later, possibly when there are other threads running concurrently. Note however that the registration scheme has been optimized for the scenario where it is called when a single thread is running in the process (see sync_runqueues_membarrier_state()). Otherwise we need to use the more heavyweight synchronize_rcu(). So my advice would be to perform the membarrier expedited registration while the process is still single-threaded if possible, rather than postpone this and do it entirely lazily on first use, which may happen while other threads are already running. Thanks, Mathieu > > Thanks, > Florian -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com