On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote: > Hi Puranjay and Paul, > > I'm running some experiment on the (experimental) formalization of BPF > acquire and release available from [1] and wanted to report about some > (initial) observations for discussion and possibly future developments; > apologies in advance for the relatively long email and any repetition. > > > A first and probably most important observation is that the (current) > formalization of acquire and release appears to be "too strong": IIUC, > the simplest example/illustration for this is given by the following > > BPF R+release+fence > { > 0:r2=x; 0:r4=y; > 1:r2=y; 1:r4=x; 1:r6=l; > } > P0 | P1 ; > r1 = 1 | r1 = 2 ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ; > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ; > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ; > exists ([y]=2 /\ 1:r3=0) > > This "exists" condition is not satisfiable according to the BPF model; > however, if we adopt the "natural"/intended(?) PowerPC implementations > of the synchronization primitives above (aka, with store_release() --> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the > condition in question becomes (architecturally) satisfiable on PowerPC > (although I'm not aware of actual observations on PowerPC hardware). Yes, you are quite right, for efficient use on PowerPC, we need the BPF memory model to allow the above cycle in the R litmus test. My bad, as I put too much emphasis on ARM64. > At first, the previous observation (validated via simulations and later > extended to similar but more complex scenarios ) made me believe that > the BPF formalization of acquire and release could be strictly stronger > than the corresponding LKMM formalization; but that is _not_ the case: > > The following "exists" condition is satisfiable according to the BPF > model (and it remains satisfiable even if the load_acquire() in P2 is > paired with an additional store_release() in P1). In contrast, the > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire() > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the > same conclusion holds even if the putative smp_load_acquire() in P2 is > "replaced" with an smp_rmb() or with an address dependency). > > BPF Z6.3+fence+fence+acquire > { > 0:r2=x; 0:r4=y; 0:r6=l; > 1:r2=y; 1:r4=z; 1:r6=m; > 2:r2=z; 2:r4=x; > } > P0 | P1 | P2 ; > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ; > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ; > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ; > r3 = 1 | r3 = 1 | ; > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ; > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0) And again agreed, we do want to forbid Z6.3. > These remarks show that the proposed BPF formalization of acquire and > release somehow, but substantially, diverged from the corresponding > LKMM formalization. My guess is that the divergences mentioned above > were not (fully) intentional, or I'm wondering -- why not follow the > latter (the LKMM's) more closely? - This is probably the first question > I would need to clarify before trying/suggesting modifications to the > present formalizations. ;-) Thoughts? Thank you for digging into this! I clearly need to get my validation work going again, but I very much welcome any further help you would be willing to provide. Thanx, Paul > Andrea > > > [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/