> But the subset of the LKMM which deals with "strong fences" and Acq & > Rel (limited to so called marked accesses) seems relatively contained > /simple: its analysis could be useful, if not determining, in trying > to resolve the above issues. Elaborating on the previous suggestion/comparison with the LKMM, the "subset" in question can take the following form (modulo my typos): "LKMM with acquire/release, strong fences, and marked accesses only" [...] let acq-po = [Acq] ; po ; [M] let po-rel = [M] ; po ; [Rel] let strong-fence = [M] ; fencerel(Mb) ; [M] let ppo = acq-po | po-rel | strong-fence let A-cumul(r) = rfe? ; r let cumul-fence = A-cumul(strong-fence | po-rel) let overwrite = co | fr let prop = (overwrite & ext)? ; cumul-fence* ; rfe? let hb = ppo | rfe | ((prop \ id) & int) acyclic hb as Hb let pb = prop ; strong-fence ; hb* acyclic pb as Pb For BPF, we'd want to replace acq-po, po-rel and strong-fence with load_acquire, store_release and po_amo_fetch respectively: Unless I'm missing something, this should restore the intended behaviors for the R and Z6.3 tests discussed earlier. A couple of other remarks: - Notice how the above formalization is completely symmetrical wrt. co <-> fr, IOW, co links are considered "on par with" fr links. In particular, the following test is satisfiable in the above formalization, as is the corresponding C test in the LKMM: BPF 2+2W+release+fence { 0:r2=x; 0:r4=y; 1:r2=y; 1:r4=x; 1:r6=l; } P0 | P1 ; r1 = 1 | r1 = 1 ; *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; r3 = 2 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; store_release((u32 *)(r4 + 0), r3) | r3 = 2 ; | store_release((u32 *)(r4 + 0), r3) ; exists ([x]=1 /\ [y]=1) (On an historical note, this wasn't always the case in the LKMM, cf. e.g. [1], but be alerted that the formalization in [1] is decisively more involved and less intuitive than today's / what the LKMM community has converged to. ;-) ) - The above formalization merges the so called "Observation" axiom in the "Happens-before" axiom. In the LKMM, this followed the removal of B-cumulativity for smp_wmb() and smp_store_release() and a consequent "great simplification" of the hb relation: link [2] can provide more details and some examples related to those changes. For completeness, here is the BPF analogue of test "C-release-acquire-is-B-cumulative.litmus" from that article: BPF ISA2+release+acquire+acquire { 0:r2=x; 0:r4=y; 1:r2=y; 1:r4=z; 2:r2=z; 2:r4=x; } P0 | P1 | P2 ; r1 = 1 | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ; *(u32 *)(r2 + 0) = r1 | r3 = 1 | r3 = *(u32 *)(r4 + 0) ; r3 = 1 | *(u32 *)(r4 + 0) = r3 | ; store_release((u32 *)(r4 + 0), r3) | | ; exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0) The formalization sketched above allows this behavior. Notice, however, that the behavior is forbidden after "completion" of the release/acquire chain, i.e. by making the store from P1 a store-release (a property also known as A-cumulativy of the release operation). I guess the next question (once clarified the intentions for the R and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W and B-cumulativity for store-release?"; I mentioned some tradeoff, but in the end this is a call for the BPF community. Andrea [1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/StrongModel.html [2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Cumulativity