Re: [PATCH memory-model 2/4] Documentation/litmus-tests: Demonstrate unordered failing cmpxchg

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Mon, 6 May 2024 11:00:42 -0700

On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > operation.
> > 
> > So far, my understanding was that all RMW operations without suffix
> > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> > 
> > I guess this shows again how important it is to model these full
> > barriers explicitly inside the cat model, instead of relying on implicit
> > conversions internal to herd.
> > 
> > I'd like to propose a patch to this effect.
> > 
> > What is the intended behavior of a failed cmpxchg()? Is it the same as a
> > relaxed one?

Yes, and unless I am too confused, LKMM currently does implement this.
Please let me know if I am missing something.

> > My suggestion would be in the direction of marking read and write events
> > of these operations as Mb, and then defining
> > 
> > (* full barrier events that appear in non-failing RMW *)
> > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > 
> > 
> > let mb =
> >      [M] ; fencerel(Mb) ; [M]
> >    | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> >    | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> >    | ...
> > 
> > The po \ rmw is because ordering is not provided internally of the rmw
> 
> (removed the unnecessary si since LKMM is still non-mixed-accesses)

Addition of mixed-access support would be quite welcome!

> This could also be written with a single rule:
> 
>      | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> 
> > I suspect that after we added [rmw] sequences it could perhaps be
> > simplified [...]
> 
> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
> act like strong fences when they appear in an rmw sequence.
> 
>  if (z==1)  ||  x = 2;     ||  xchg(&y,2)  || if (y==2)
>    x = 1;   ||  y =_rel 1; ||              ||    z=1;
> 
> 
> right now, we allow x=2 overwriting x=1 (in case the last thread does not
> propagate x=2 along with z=1) because on power, the xchg might be
> implemented with a sync that doesn't get executed until the very end
> of the program run.
> 
> 
> Instead of its negative form (everything other than inside the rmw),
> it could also be rewritten positively. Here's a somewhat short form:
> 
> let mb =
>      [M] ; fencerel(Mb) ; [M]
>    (* everything across a full barrier RMW is ordered. This includes up to
> one event inside the RMW. *)
>    | [M] ; po ; [RMW_MB] ; po ; [M]
>    (* full barrier RMW writes are ordered with everything behind the RMW *)
>    | [W & RMW_MB] ; po ; [M]
>    (* full barrier RMW reads are ordered with everything before the RMW *)
>    | [M] ; po ; [R & RMW_MB]
>    | ...

Does this produce the results expected by the litmus tests in the Linux
kernel source tree and also those at https://github.com/paulmckrcu/litmus?

							Thanx, Paul