On Sat, Feb 8, 2025 at 6:21 PM Peilin Ye <yepeilin@xxxxxxxxxx> wrote: > > Hi Alexei, > > On Sat, Feb 08, 2025 at 01:30:46PM -0800, Alexei Starovoitov wrote: > > > Introduce BPF instructions with load-acquire and store-release > > > semantics, as discussed in [1]. The following new flags are defined: > > > > > > BPF_ATOMIC_LOAD 0x10 > > > BPF_ATOMIC_STORE 0x20 > > > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > > > > > BPF_RELAXED 0x0 > > > BPF_ACQUIRE 0x1 > > > BPF_RELEASE 0x2 > > > BPF_ACQ_REL 0x3 > > > BPF_SEQ_CST 0x4 > > > > I still don't like this. > > > > Earlier you said: > > > > > If yes, I think we either: > > > > > > (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and > > > STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by > > > RMW atomics > > > (b) specify memorder in imm<0-3> > > > > > > I chose (b) for fewer "What would be a good numerical value so that RMW > > > atomics won't need to use it in imm<4-7>?" questions to answer. > > > > > > If we're having dedicated fields for memorder, I think it's better to > > > define all possible values once and for all, just so that e.g. 0x2 will > > > always mean RELEASE in a memorder field. Initially I defined all six of > > > them [2], then Yonghong suggested dropping CONSUME [3]. > > > > I don't think we should be defining "all possible values", > > since these are the values that llvm and C model supports, > > but do we have any plans to support anything bug ld_acq/st_rel ? > > I haven't heard anything. > > What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ? > > > > What does the verifier suppose to do? reject for now? and then what? > > Map to what insn? > > > > These values might imply that bpf infra is supposed to map all the values > > to cpu instructions, but that's not what we're doing here. > > We're only dealing with two specific instructions. > > We're not defining a memory model for all future new instructions. > > Got it! In v3, I'll change it back to: > > #define BPF_LOAD_ACQ 0x10 > #define BPF_STORE_REL 0x20 why not 1 and 2 ? All other bits are reserved and the verifier will make sure they're zero, so when/if we need to extend it then it wouldn't matter whether lower 4 bits are reserved or other bits. Say, we decide to support cmpwait_relaxed as a new insn. It can take the value 3 and arm64 JIT will map it to ldxr+wfe+... Then with this new load_acq and cmpwait_relaxed we can efficiently implement both smp_cond_load_relaxed and smp_cond_load_acquire.