On Wed, Oct 23, 2024 at 09:25:40PM -0700, Paul E. McKenney wrote: > On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote: > > Hi Puranjay and Paul, > > > > I'm running some experiment on the (experimental) formalization of BPF > > acquire and release available from [1] and wanted to report about some > > (initial) observations for discussion and possibly future developments; > > apologies in advance for the relatively long email and any repetition. > > > > > > A first and probably most important observation is that the (current) > > formalization of acquire and release appears to be "too strong": IIUC, > > the simplest example/illustration for this is given by the following > > > > BPF R+release+fence > > { > > 0:r2=x; 0:r4=y; > > 1:r2=y; 1:r4=x; 1:r6=l; > > } > > P0 | P1 ; > > r1 = 1 | r1 = 2 ; > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > > exists ([y]=2 /\ 1:r3=0) > > > > This "exists" condition is not satisfiable according to the BPF model; > > however, if we adopt the "natural"/intended(?) PowerPC implementations > > of the synchronization primitives above (aka, with store_release() --> > > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > > condition in question becomes (architecturally) satisfiable on PowerPC > > (although I'm not aware of actual observations on PowerPC hardware). > > Yes, you are quite right, for efficient use on PowerPC, we need the BPF > memory model to allow the above cycle in the R litmus test. My bad, > as I put too much emphasis on ARM64. > > > At first, the previous observation (validated via simulations and later > > extended to similar but more complex scenarios ) made me believe that > > the BPF formalization of acquire and release could be strictly stronger > > than the corresponding LKMM formalization; but that is _not_ the case: > > > > The following "exists" condition is satisfiable according to the BPF > > model (and it remains satisfiable even if the load_acquire() in P2 is > > paired with an additional store_release() in P1). In contrast, the > > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire() > > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the > > same conclusion holds even if the putative smp_load_acquire() in P2 is > > "replaced" with an smp_rmb() or with an address dependency). > > > > BPF Z6.3+fence+fence+acquire > > { > > 0:r2=x; 0:r4=y; 0:r6=l; > > 1:r2=y; 1:r4=z; 1:r6=m; > > 2:r2=z; 2:r4=x; > > } > > P0 | P1 | P2 ; > > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; > > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; > > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; > > r3 = 1 | r3 = 1 | ; > > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; > > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) > > And again agreed, we do want to forbid Z6.3. > > > These remarks show that the proposed BPF formalization of acquire and > > release somehow, but substantially, diverged from the corresponding > > LKMM formalization. My guess is that the divergences mentioned above > > were not (fully) intentional, or I'm wondering -- why not follow the > > latter (the LKMM's) more closely? - This is probably the first question > > I would need to clarify before trying/suggesting modifications to the > > present formalizations. ;-) Thoughts? > > Thank you for digging into this! > > I clearly need to get my validation work going again, but I very much > welcome any further help you would be willing to provide. Thanks for the confirmation. The BPF tests above (and other I have) were all hand-written, but I'm working towards improving such automation/validation; I won't keep it a secret should I find something relevant/interesting. :-) But the subset of the LKMM which deals with "strong fences" and Acq & Rel (limited to so called marked accesses) seems relatively contained /simple: its analysis could be useful, if not determining, in trying to resolve the above issues. Andrea