On Tue, Aug 6, 2024 at 12:22 PM Peilin Ye <yepeilin@xxxxxxxxxx> wrote: > > Hi Alexei, > > Thanks for all the suggestions! Some questions: > > On Mon, Jul 29, 2024 at 06:28:16PM -0700, Alexei Starovoitov wrote: > > On Mon, Jul 29, 2024 at 11:33 AM Peilin Ye <yepeilin@xxxxxxxxxx> wrote: > > > We need more. During offline discussion with Paul, we agreed we can start > > > from: > > > > > > * load-acquire: __atomic_load_n(... memorder=__ATOMIC_ACQUIRE); > > > * store-release: __atomic_store_n(... memorder=__ATOMIC_RELEASE); > > > > we would need inline asm equivalent too. Similar to kernel > > smp_load_acquire() macro. > > I see, so something like: > > asm volatile("%0 = load_acquire((u64 *)(%1 + 0x0))" : > "=r"(ret) : "r"(&foo) : "memory"); > > and e.g. this in disassembly: > > r0 = load_acquire((u64 *)(r1 + 0x0)) yes. > I agree that we'd better not put the entire e.g. > "r0 = __atomic_load_n((u64 *)(r1 + 0x0), __ATOMIC_ACQUIRE)" into > disassembly. exactly. That's too verbose. > > > Theoretically, the BPF JIT compiler could also reorder instructions just like > > > Clang or GCC, though it might not currently do so. If we ever developed a more > > > optimizing BPF JIT compiler, it would also be nice to have an optimization > > > barrier for it. However, Alexei Starovoitov has expressed that defining a BPF > > > instruction with 'asm volatile ("" ::: "memory");' semantics might be tricky. > > > > It can be a standalone insn that is a compiler barrier only but that feels like > > a waste of an instruction. So depending how we end up encoding various > > real barriers > > there may be a bit to spend in such a barrier insn that is only a > > compiler barrier. > > In this case optimizing JIT barrier. > > [...] > > > > Roughly, the scope of this work includes: > > > > > > * decide how to extend the BPF ISA (add new instructions and/or extend > > > current ones) > > > > ldx/stx insns support MEM and MEMSX modifiers. > > Adding MEM_ACQ_REL feels like a natural fit. Better name? > > Do we allow aliases? E.g., can we have "MEMACQ" for LDX and "MEMREL" > for STX, but let them share the same numeric value? yes. See #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ #define BPF_XADD 0xc0 /* exclusive add - legacy name */ but it has to be backward compatible. > Speaking of numeric value, out of curiosity: > > IMM 0 > ABS 1 > IND 2 > MEM 3 > MEMSX 4 > ATOMIC 6 > > Was there a reason that we skipped 5? Is 5 reserved? See /* unused opcode to mark special load instruction. Same as BPF_ABS */ #define BPF_PROBE_MEM 0x20 /* unused opcode to mark special ldsx instruction. Same as BPF_IND */ #define BPF_PROBE_MEMSX 0x40 /* unused opcode to mark special load instruction. Same as BPF_MSH */ #define BPF_PROBE_MEM32 0xa0 it's used by the verifier when it remaps opcode to tell JIT. It can be used, but then the internal opcode needs to change too. > > For barriers we would need a new insn. Not sure which class would fit the best. > > Maybe BPF_LD ? > > > > Another alternative for barriers is to use nocsr kfuncs. > > Then we have the freedom to make mistakes and fix them later. > > One kfunc per barrier would do. > > JITs would inline them into appropriate insns. > > In bpf progs they will be used just like in the kernel code smp_mb(), > > smp_rmb(), etc. > > > > I don't think compilers have to emit barriers from C code, so my > > preference is kfuncs atm. > > Ah, I see; we recently supported [1] nocsr BPF helper functions. The > cover letter says: > > """ > This patch-set seeks to allow using no_caller_saved_registers > gcc/clang attribute with some BPF helper functions (and kfuncs in the > future). > """ > > It seems that nocsr BPF kfuncs are not supported yet. Do we have a > schedule for it? Support for nocsr for kfuncs is being added. Assume it's already available :) It's not a blocker to add barrier kfuncs.