On Fri, 14 Dec 2018, Paul E. McKenney wrote: > I would say that sys_membarrier() has zero-sized read-side critical > sections, either comprising a single instruction (as is the case for > synchronize_sched(), actually), preempt-disable regions of code > (which are irrelevant to userspace execution), or the spaces between > consecutive pairs of instructions (as is the case for the newer > IPI-based implementation). > > The model picks the single-instruction option, and I haven't yet found > a problem with this -- which is no surprise given that, as you say, > an actual implementation makes this same choice. I believe that for RCU tests the LKMM gives the same results for length-zero critical sections interspersed between all the instructions and length-one critical sections surrounding all instructions (except synchronize_rcu). But the proof is tricky and I haven't checked it carefully. > > > The other thing that took some time to get used to is the possibility > > > of long delays during sys_membarrier() execution, allowing significant > > > execution and reordering between different CPUs' IPIs. This was key > > > to my understanding of the six-process example, and probably needs to > > > be clearly called out, including in an example or two. > > > > In all the examples I'm aware of, no more than one of the IPIs > > generated by each sys_membarrier call really matters. (Of course, > > there's no way to know in advance which one it will be, so you have to > > send an IPI to every CPU.) The execution delays and reordering > > between different CPUs' IPIs don't appear to be significant. > > Well, there are litmus tests that are allowed in which the allowed > execution is more easily explained in terms of delays between different > CPUs' IPIs, so it seems worth keeping track of. > > There might be a litmus test that can tell the difference between > simultaneous and non-simultaneous IPIs, but I cannot immediately think of > one that matters. Might be a failure of imagination on my part, though. P0 P1 P2 Wc=1 [mb01] Rb=1 memb Wa=1 Rc=0 Ra=0 Wb=1 [mb02] The IPIs have to appear in the positions shown, which means they cannot be simultaneous. The test is allowed because P2's reads can be reordered. Alan