Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models"

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 12 Sep 2022 08:01:03 -0400

On Mon, Sep 12, 2022 at 10:13:33AM +0000, Jonas Oberhauser wrote:
> As I tried to explain before, this problem has nothing to do with 
> stores propagating within a given time to another core. Rather it is 
> due to two stores to the same location happening in a surprising 
> order. I.e., both stores propagate quickly to other cores, but in a 
> surprising coherence order.And if a wmb in the code is replaced by an 
> mb, then this co will create a pb cycle and become forbidden.
> 
> Therefore this hang should be observable on a hypothetical LKMM 
> processor which makes use of all the relaxed liberty the LKMM allows. 
> However according to the authors of that paper (who are my colleagues 
> but I haven't been involved deeply in that work), not even Power+gcc 
> allow this reordering to happen, and if that's true it is probably 
> because the wmb is mapped to lwsync which is fully cumulative in Power 
> but not in LKMM.

Yes, that's right.  On ARM64 architectures the reordering is forbidden 
by other multi-copy atomicity, and on PPC is it forbidden by 
B-cumulativity (neither of which is part of the LKMM).

If I'm not mistaken, another way to forbid it is to replace one of the 
relaxed atomic accesses with an atomic access having release semantics.  
Perhaps this change will find its way into the kernel source, since it 
has less overhead than replacing wmb with bm.

Alan