----- On May 30, 2015, at 12:40 AM, Andrew Morton akpm@xxxxxxxxxxxxxxxxxxxx wrote: > On Sat, 16 May 2015 19:48:18 -0400 Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > >> Here is an implementation of a new system call, sys_membarrier(), which >> executes a memory barrier on all threads running on the system. It is >> implemented by calling synchronize_sched(). It can be used to distribute >> the cost of user-space memory barriers asymmetrically by transforming >> pairs of memory barriers into pairs consisting of sys_membarrier() and a >> compiler barrier. For synchronization primitives that distinguish >> between read-side and write-side (e.g. userspace RCU [1], rwlocks), the >> read-side can be accelerated significantly by moving the bulk of the >> memory barrier overhead to the write-side. >> >> ... >> > > It would be nice to hear about the real world value of this syscall to > our users. I'm seeing test results for a microbenchmark but so what. > What actual applications or application classes are calling for this and > what results can they expect to see? AFAIK, the existing open source applications that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. > >> >> membarrier(2) man page: >> --------------- snip ------------------- >> MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) >> >> NAME >> membarrier - issue memory barriers on a set of threads >> >> SYNOPSIS >> #include <linux/membarrier.h> >> >> int membarrier(int cmd, int flags); >> >> DESCRIPTION >> The cmd argument is one of the following: >> >> MEMBARRIER_CMD_QUERY >> Query the set of supported commands. It returns a bitmask of >> supported commands. >> >> MEMBARRIER_CMD_SHARED >> Execute a memory barrier on all threads running on the system. >> Upon return from system call, the caller thread is ensured that >> all running threads have passed through a state where all memory >> accesses to user-space addresses match program order between >> entry to and return from the system call (non-running threads >> are de facto in such a state). This covers threads from all pro___ >> cesses running on the system. This command returns 0. >> >> The flags argument needs to be 0. For future extensions. >> >> All memory accesses performed in program order from each targeted >> thread is guaranteed to be ordered with respect to sys_membarrier(). If >> we use the semantic "barrier()" to represent a compiler barrier forcing >> memory accesses to be performed in program order across the barrier, >> and smp_mb() to represent explicit memory barriers forcing full memory >> ordering across the barrier, we have the following ordering table for >> each pair of barrier(), sys_membarrier() and smp_mb(): >> >> The pair ordering is detailed as (O: ordered, X: not ordered): >> >> barrier() smp_mb() sys_membarrier() >> barrier() X X O >> smp_mb() X O O >> sys_membarrier() O O O >> >> RETURN VALUE >> On success, these system calls return zero. On error, -1 is returned, >> and errno is set appropriately. For a given command, with flags >> argument set to 0, this system call is guaranteed to always return the >> same value until reboot. > > I suggest "with flags argument set to MEMBARRIER_CMD_QUERY" here. No, the enum is for the "cmd" argument (see above) not the flags argument. We really mean flags = 0 (the value) here. > >> >> ERRORS >> ENOSYS System call is not implemented. >> >> EINVAL Invalid arguments. >> >> ... >> >> +SYSCALL_DEFINE2(membarrier, int, cmd, int, flags) >> +{ >> + if (flags) >> + return -EINVAL; > > I'm not a huge fan of this "add a flags arg to syscalls" rule. Is > there any realistic expectation that we'll ever *use* this thing? If > not, why add it? I can see this system call evolve in a few ways in the future, such as having an expedited version (using IPIs), targeting the local thread group, and targeting all threads mapping a specific shared memory mapping. I guess that the cmd argument should be enough to cover that, but in doubt, it might be better to keep a flags argument there for future needs we might be overlooking right now, so we never end up needing a sys_membarrier2 system call. > > You may as well put an unlikely() in there btw. Will do. Thanks! Mathieu > >> + switch (cmd) { >> + case MEMBARRIER_CMD_QUERY: >> + return MEMBARRIER_CMD_BITMASK; >> + case MEMBARRIER_CMD_SHARED: >> + if (num_online_cpus() > 1) >> + synchronize_sched(); >> + return 0; >> + default: >> + return -EINVAL; >> + } > > +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html