On Mon, Dec 13, 2021 at 7:48 PM Florian Weimer <fweimer@xxxxxxxxxx> wrote: > I've been studying Jann Horn's biased locking example: > > Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space > <https://lore.kernel.org/linux-api/CAG48ez02UDn_yeLuLF4c=kX0=h2Qq8Fdb0cer1yN8atbXSNjkQ@xxxxxxxxxxxxxx/> > > It uses MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ as part of the biased lock > revocation. > > How does the this code know that the process has called > MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ? Could it fall back to > MEMBARRIER_CMD_GLOBAL instead? AFAIK no - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ specifically forces targeted processes to go through an RSEQ preemption. That only happens when this special membarrier command is used and when an actual task switch happens; other membarrier flavors don't guarantee that. Also, MEMBARRIER_CMD_GLOBAL can take really long in terms of wall clock time - it's basically just synchronize_rcu(), and as the documentation at https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html says: "The synchronize_rcu() grace-period-wait primitive is optimized for throughput. It may therefore incur several milliseconds of latency in addition to the duration of the longest RCU read-side critical section." You can see that synchronize_rcu() indeed takes quite long in terms of wall clock time (but not in terms of CPU time - as the documentation says, it's optimized for throughput in a parallel context) with a simple test program: jannh@laptop:~/test/rcu$ cat rcu_membarrier.c #define _GNU_SOURCE #include <stdio.h> #include <linux/membarrier.h> #include <sys/syscall.h> #include <unistd.h> #include <time.h> #include <err.h> int main(void) { for (int i=0; i<20; i++) { struct timespec ts1; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) err(1, "time"); if (syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0, 0)) err(1, "membarrier"); struct timespec ts2; if (clock_gettime(CLOCK_MONOTONIC, &ts2)) err(1, "time"); unsigned long delta_ns = (ts2.tv_nsec - ts1.tv_nsec) + (1000UL*1000*1000) * (ts2.tv_sec - ts1.tv_sec); printf("MEMBARRIER_CMD_GLOBAL took %lu nanoseconds\n", delta_ns); } } jannh@laptop:~/test/rcu$ gcc -o rcu_membarrier rcu_membarrier.c -Wall jannh@laptop:~/test/rcu$ time ./rcu_membarrier MEMBARRIER_CMD_GLOBAL took 17155142 nanoseconds MEMBARRIER_CMD_GLOBAL took 19207001 nanoseconds MEMBARRIER_CMD_GLOBAL took 16087350 nanoseconds MEMBARRIER_CMD_GLOBAL took 15963711 nanoseconds MEMBARRIER_CMD_GLOBAL took 16336149 nanoseconds MEMBARRIER_CMD_GLOBAL took 15931331 nanoseconds MEMBARRIER_CMD_GLOBAL took 16020315 nanoseconds MEMBARRIER_CMD_GLOBAL took 15873814 nanoseconds MEMBARRIER_CMD_GLOBAL took 15945667 nanoseconds MEMBARRIER_CMD_GLOBAL took 23815452 nanoseconds MEMBARRIER_CMD_GLOBAL took 23626444 nanoseconds MEMBARRIER_CMD_GLOBAL took 19911435 nanoseconds MEMBARRIER_CMD_GLOBAL took 23967343 nanoseconds MEMBARRIER_CMD_GLOBAL took 15943147 nanoseconds MEMBARRIER_CMD_GLOBAL took 23914809 nanoseconds MEMBARRIER_CMD_GLOBAL took 32498986 nanoseconds MEMBARRIER_CMD_GLOBAL took 19450932 nanoseconds MEMBARRIER_CMD_GLOBAL took 16281308 nanoseconds MEMBARRIER_CMD_GLOBAL took 24045168 nanoseconds MEMBARRIER_CMD_GLOBAL took 15406698 nanoseconds real 0m0.458s user 0m0.058s sys 0m0.031s jannh@laptop:~/test/rcu$ Every invocation of MEMBARRIER_CMD_GLOBAL on my laptop took >10 ms.