On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote: > Introduce BPF instructions with load-acquire and store-release > semantics, as discussed in [1]. The following new flags are defined: > > BPF_ATOMIC_LOAD 0x10 > BPF_ATOMIC_STORE 0x20 > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > BPF_RELAXED 0x0 > BPF_ACQUIRE 0x1 > BPF_RELEASE 0x2 > BPF_ACQ_REL 0x3 > BPF_SEQ_CST 0x4 > > BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE) > BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE) > > A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' > field set to BPF_LOAD_ACQ (0x11). > > Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction > with > the 'imm' field set to BPF_STORE_REL (0x22). > > Unlike existing atomic operations that only support BPF_W (32-bit) > and > BPF_DW (64-bit) size modifiers, load-acquires and store-releases also > support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load- > acquire > zero-extends the value before writing it to a 32-bit register, just > like > ARM64 instruction LDARH and friends. > > As an example, consider the following 64-bit load-acquire BPF > instruction: > > db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) > > opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX > imm (0x00000011): BPF_LOAD_ACQ > > Similarly, a 16-bit BPF store-release: > > cb 21 00 00 22 00 00 00 store_release((u16 *)(r1 + 0x0), w2) > > opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX > imm (0x00000022): BPF_STORE_REL > > In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have > bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the > new > instructions, until the corresponding JIT compiler supports them. > > [1] > https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@xxxxxxxxxx/ > > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > Signed-off-by: Peilin Ye <yepeilin@xxxxxxxxxx> > --- > arch/arm64/net/bpf_jit_comp.c | 4 +++ > arch/s390/net/bpf_jit_comp.c | 14 +++++--- > arch/x86/net/bpf_jit_comp.c | 4 +++ > include/linux/bpf.h | 11 ++++++ > include/linux/filter.h | 2 ++ > include/uapi/linux/bpf.h | 13 +++++++ > kernel/bpf/core.c | 63 ++++++++++++++++++++++++++++++-- > -- > kernel/bpf/disasm.c | 12 +++++++ > kernel/bpf/verifier.c | 45 ++++++++++++++++++++++-- > tools/include/uapi/linux/bpf.h | 13 +++++++ > 10 files changed, 168 insertions(+), 13 deletions(-) Acked-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> s390x has a strong memory model, and the regular load and store instructions are atomic as long as operand addresses are aligned. IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT is set, in which case whoever loaded the program is responsible for the consequences: memory accesses that happen to be unaligned would not trigger an exception, but they would not be atomic either. So I can implement the new instructions as normal loads/stores after this series is merged.