On Fri, Aug 31, 2018 at 08:28:46PM +0200, Andrea Parri wrote: > > > Yes, it's true that implementing locks with atomic_cmpxchg_acquire > > > should be correct on all existing architectures. And Paul has invited > > > a patch to modify the LKMM accordingly. If you feel that such a change > > > would be a useful enhancement to the LKMM's applicability, please write > > > it. > > > > Yes, please! That would be the "RmW" discussion which Andrea partially > > quoted earlier on, so getting that going independently from this patch > > sounds like a great idea to me. > > That was indeed one of the proposal we discussed. As you recalled, that > proposal only covered RmWs load-acquire (and ordinary store-release); in > particular, I realized that comments such as: > > "The atomic_cond_read_acquire() call above has provided the > necessary acquire semantics required for locking." > > [from kernel/locking/qspinlock.c] > > (for example) would still _not have "generic validity" _if we added the > above po-unlock-rf-lock-po term... (which, again, makes me somehow uncon- > fortable); Would to have _all_ the acquire be admissible for you? In Cat speak, diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat index 59b5cbe6b6240..fd9c0831adf0a 100644 --- a/tools/memory-model/linux-kernel.cat +++ b/tools/memory-model/linux-kernel.cat @@ -38,7 +38,7 @@ let strong-fence = mb | gp (* Release Acquire *) let acq-po = [Acquire] ; po ; [M] let po-rel = [M] ; po ; [Release] -let rfi-rel-acq = [Release] ; rfi ; [Acquire] +let po-rel-rf-acq-po = po ; [Release] ; rf ; [Acquire] ; po (**********************************) (* Fundamental coherence ordering *) @@ -60,13 +60,13 @@ let dep = addr | data let rwdep = (dep | ctrl) ; [W] let overwrite = co | fr let to-w = rwdep | (overwrite & int) -let to-r = addr | (dep ; rfi) | rfi-rel-acq +let to-r = addr | (dep ; rfi) let fence = strong-fence | wmb | po-rel | rmb | acq-po -let ppo = to-r | to-w | fence +let ppo = to-r | to-w | fence | (po-rel-rf-acq-po & int) (* Propagation: Ordering from release operations and strong fences. *) let A-cumul(r) = rfe? ; r -let cumul-fence = A-cumul(strong-fence | po-rel) | wmb +let cumul-fence = A-cumul(strong-fence | po-rel) | wmb | po-rel-rf-acq-po let prop = (overwrite & ext)? ; cumul-fence* ; rfe? (* I take this opportunity to summarize my viewpoint on these matters: Someone would have to write the commit message for the above diff ... that is, to describe -why- we should go RCtso (and update the documen- tation accordingly); by now, the only argument for this appears to be: "(most) people expect strong ordering" _and they will be "lazy enough" to not check their expectations by using the LKMM tool (paraphrasing from [1]); IAC, Linux "might work" better if we add this ordering to the LKMM. Agreeing on such an approach would mean agreeing that this argument "wins" over: "We want new architectures to implement acquire/release efficiently, and it's not unlikely that they will have acquire loads that are similar in semantics to LDAPR." [2] "RISC-V probably would have been RCpc [...] it takes extra fences to go from RCpc to either "RCtso" or RCsc." [3] (or similar instances) since, of course, there is no such thing as a "free strong ordering"; and I'm not only talking about "efficiency", I'm also thinking at the fact that someone will have to maintain that ordering across all the architectures and in the LKMM. If, OTOH, we agree that the above "win"/assumption is valid only for locks or, in other/better words, if we agree that we should maintain _two_ distinct release-acquire orderings (a first one for unlock-lock sequences and a second one for ordinary/atomic release-acquire, say, as proposed in the patch under RFC), I ask that we audit and modify the generic code accordingly/as suggested in other posts _before_ we upstream the changes for the LKMM: we should identify those places where (the newly introduced) _gap_ between unlock-lock and the other release-acquire is not admissible and fix those places (notice that this entails, in part., agreeing on what/where the generic code is). Finally, if we don't agree with the above assumption at all (that is, no matter if we are considering unlock-lock or other release-acquire sequences), then we should go RCpc [4]. I described three different approaches (which are NOT "independent", clearly; let us find an agreement...); even though some of them look insane to me, I'm currently open to all of them: thoughts? Andrea [1] http://lkml.kernel.org/r/20180712134821.GT2494@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx http://lkml.kernel.org/r/CA+55aFwKpkU5C23OYt1HCiD3X5bJHVh1jz5G2dSnF1+kVrOCTA@xxxxxxxxxxxxxx [2] http://lkml.kernel.org/r/20180622183007.GD1802@xxxxxxx [3] http://lkml.kernel.org/r/11b27d32-4a8a-3f84-0f25-723095ef1076@xxxxxxxxxx [4] http://lkml.kernel.org/r/20180711123421.GA9673@andrea http://lkml.kernel.org/r/Pine.LNX.4.44L0.1807132133330.26947-100000@xxxxxxxxxxxxxxxxxxxx