On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote: > Am 5/22/2024 um 4:20 PM schrieb Alan Stern: > > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote: > > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern: > > > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote: > > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern: > > > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote: > > > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote: > > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern: > > > > > > > > > Hernan and Jonas: > > > > > > > > > > > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or > > > > > > > > > the LKMM? The goal is to make the memory barriers currently implicit in > > > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do > > > > > > > > > this. > > > > > > > > > > > > > > > > > > Are you going to change herd7 somehow, and if so, how? It seems like > > > > > > > > > you should want to provide sufficient information so that the .bell > > > > > > > > > and .cat files can implement the appropriate memory barriers associated > > > > > > > > > with each RMW operation. What additional information is needed? And > > > > > > > > > how (explained in English, not by quoting source code) will the .bell > > > > > > > > > and .cat files make use of this information? > > > > > > > > > > > > > > > > > > Alan > > > > > > > > > > > > > > > > > > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the > > > > > > > > following: > > > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I > > > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with > > > > > > > > one of those things: > > > > > > > > * full mb ; once (the rmw) ; full mb, if a value returning > > > > > > > > (successful) rmw > > > > > > > > * once (the rmw) otherwise > > > > > > > > - everything else gets translated 1:1 into some internal representation > > > > > > > > > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's > > > > > > > repo. > > > > > > > > > > > > > > Also, this is exactly what dartagnan is currently doing. > > > > > > > > > > > > > > > > > > > > > > > What I'm proposing is: > > > > > > > > 1. remove this transpilation step, > > > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions > > > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag > > > > > > > > on RMW instructions > > > > > > > > > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas > > > > > > > proposed. > > > > > > > > > > > > > > I am not sure if further changes are needed for herd7. > > > > > > > > What about failed RMW instructions? IIRC, herd7 generates just an R for > > > > these, not both R and W, but won't it still be annotated with an mb tag? > > > > And wasn't this matter of failed RMWs one of the issues that the two of > > > > you specifically wanted to make explicit in the memory model, rather > > > > than implicit in the operation of herd7? > > > > > > That's why we use the RMW_MB tag. I should have copied that definition too: > > > > > > > > > (* full barrier events that appear in non-failing RMW *) > > > let RMW_MB = Mb & (dom(rmw) | range(rmw)) > > > > > > > > > This ensures that the only successful rmw instructions have an RMW_MB tag. > > > > It would be better if there was a way to tell herd7 not to add the 'mb > > tag to failed instructions in the first place. This approach is > > brittle; see below. > > Hernan told me that in fact that is actually currently the case in herd7. > Failing RMW get assigned the Once tag implicitly. > Another thing that I'd suggest to change. Let's please be careful, though. There is a lot out there that depends on this semantic, odd though it might seem at first glance. ;-) Thanx, Paul > > An alternative would be to have a way for the .cat file to remove the > > 'mb tag from a failed RMW instruction. But I don't know if this is > > feasible. > > For Mb it's feasible, as there is no Mb read or Mb store. > > Mb = Mb & (~M | dom(rmw) | range(rmw)) > > However one would want to do the same for Acq and Rel. > > For that one would need to distinguish e.g. between a read that comes from a > failed rmw instruction, and where the tag would disappear, or a normal > standalone read. > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire, > and defining > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw)) > > Anyways we can do this change independently. So for now, we don't need > RMW_MB. > > > > > > > > And wasn't another one of these issues the difference between > > > > value-returning and non-value-returning RMWs? As far as I can, nothing > > > > in the .def file specifically mentions this. There's the noreturn tag > > > > in the .bell file, but nothing in the .def file says which instructions > > > > it applies to. Or are we supposed to know that it automatically applies > > > > to all __atomic_op() instances? > > > > > > Ah, now you're already forestalling my next suggestion :)) > > > > > > I would say let's fix one issue at a time (learned this from Andrea). > > > > > > For the other issue, do noreturn rmws always have the same ordering as once? > > > > If they aren't annotated with _acquire or _release then yes... And I > > don't know whether there _are_ any annotated no-return RMWs. If > > somebody wanted such a thing, they probably would just use a > > value-returning RMW instead. > > > > > I suspect we need to extend herd slightly more to support the second one, I > > > don't know if there's syntax for passing tags to __atomic_op. > > > > This may not be be needed. But still, it would nice to be explicit (in > > a comment in the .def file if nowhere else) that __atomic_op always adds > > a 'noreturn tag. > > > > > > > instructions RMW[{'once,'acquire,'release,'mb}] > > > > > > > > > > then the Mb tags would appear in the graph. And then I'd define the ordering > > > > > explicitly. One way is to say that an Mb tag orders all memory accesses > > > > > before(or at) the tag with all memory accesses after(or at) the tag, except > > > > > the accesses of the rmw with each other. > > > > > > > > I don't think you need to add very much. The .cat file already defines > > > > the mb relation as including the term: > > > > > > > > ([M] ; fencerel(Mb) ; [M]) > > > > > > > > All that's needed is to replace the fencerel(Mb) with something more > > > > general... > > > > And this is why I said the RMW_MB mechanism is brittle. With the 'mb > > tag still added to failed RMW events, the term above will cause the > > memory model to think there is ordering even though the event isn't in > > the RMW_MB class. > > > > Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where > F includes standalone fences). But looking into the stdlib.cat, you're > right. > > > > > > Also, is the "\ rmw" part really necessary? I don't think it makes any > > > > difference; the memory model already knows that the read part of an RMW > > > > has to happen before the write part. > > > > > > It unfortunately does make a difference, because mb is a strong fence. > > > This means that an Mb in an rmw sequence would create additional pb edges. > > > > > > prop;(rfe ; [Mb];rmw;[Mb]) > > > > > > I believe this is might give wrong results on powerpc, but I'd need to > > > double check. > > > > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's > > tricky. However, you're right that having a fictitious mb between the > > read and the write of an RMW would mean that the preceding (in coherence > > order) write would have to be visible to all CPUs before the RMW write > > could execute, and I don't believe we want to assert this. > > > > > > > One could also split it into two rules to keep with the "two full fences" > > > > > analogy. Maybe a good way would be like this: > > > > > > > > > > [M] ; po; [RMW_MB & R] ; po^? ; [M] > > > > > > > > > > [M] ; po^?; [RMW_MB & W] ; po ; [M] > > > > > > > > My preference is for the first approach. > > > > > > That was also my early preference, but to be honest I expected that you'd > > > prefer the second approach. > > > Actually, I also started warming up to it. > > > > If you do want to use this approach, it should be simplified. All you > > need is: > > > > [M] ; po ; [RMW_MB] > > > > [RMW_MB] ; po ; [M] > > > > This is because events tagged with RMW_MB always are memory accesses, > > and accesses that aren't part of the RMW are already covered by the > > fencerel(Mb) thing above. > > This has exactly the issue mentioned above - it will cause the rmw to have > an internal strong fence that on powerpc probably isn't there. > > We could do (with the assumption that Mb applies only to successful rmw): > > [M] ; po ; [Mb & R] > [Mb & W] ; po ; [M] > > > Kind regards, jonas >