On Mon, Dec 07, 2020 at 07:58:09AM -0800, Yonghong Song wrote: > > > On 12/7/20 3:28 AM, Brendan Jackman wrote: > > On Fri, Dec 04, 2020 at 07:21:22AM -0800, Yonghong Song wrote: > > > > > > > > > On 12/4/20 1:36 AM, Brendan Jackman wrote: > > > > On Thu, Dec 03, 2020 at 10:42:19PM -0800, Yonghong Song wrote: > > > > > > > > > > > > > > > On 12/3/20 8:02 AM, Brendan Jackman wrote: > > > > > > This adds instructions for > > > > > > > > > > > > atomic[64]_[fetch_]and > > > > > > atomic[64]_[fetch_]or > > > > > > atomic[64]_[fetch_]xor > > > > > > > > > > > > All these operations are isomorphic enough to implement with the same > > > > > > verifier, interpreter, and x86 JIT code, hence being a single commit. > > > > > > > > > > > > The main interesting thing here is that x86 doesn't directly support > > > > > > the fetch_ version these operations, so we need to generate a CMPXCHG > > > > > > loop in the JIT. This requires the use of two temporary registers, > > > > > > IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose. > > > > > > > > > > > > Change-Id: I340b10cecebea8cb8a52e3606010cde547a10ed4 > > > > > > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> > > > > > > --- > > > > > > arch/x86/net/bpf_jit_comp.c | 50 +++++++++++++++++++++++++++++- > > > > > > include/linux/filter.h | 60 ++++++++++++++++++++++++++++++++++++ > > > > > > kernel/bpf/core.c | 5 ++- > > > > > > kernel/bpf/disasm.c | 21 ++++++++++--- > > > > > > kernel/bpf/verifier.c | 6 ++++ > > > > > > tools/include/linux/filter.h | 60 ++++++++++++++++++++++++++++++++++++ > > > > > > 6 files changed, 196 insertions(+), 6 deletions(-) > > > > > > > > > > [...] > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > > > > index 6186280715ed..698f82897b0d 100644 > > > > > > --- a/include/linux/filter.h > > > > > > +++ b/include/linux/filter.h > > > > > > @@ -280,6 +280,66 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) > > > > [...] > > > > > > +#define BPF_ATOMIC_FETCH_XOR(SIZE, DST, SRC, OFF) \ > > > > > > + ((struct bpf_insn) { \ > > > > > > + .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \ > > > > > > + .dst_reg = DST, \ > > > > > > + .src_reg = SRC, \ > > > > > > + .off = OFF, \ > > > > > > + .imm = BPF_XOR | BPF_FETCH }) > > > > > > + > > > > > > /* Atomic exchange, src_reg = atomic_xchg((dst_reg + off), src_reg) */ > > > > > > > > > > Looks like BPF_ATOMIC_XOR/OR/AND/... all similar to each other. > > > > > The same is for BPF_ATOMIC_FETCH_XOR/OR/AND/... > > > > > > > > > > I am wondering whether it makes sence to have to > > > > > BPF_ATOMIC_BOP(BOP, SIZE, DST, SRC, OFF) and > > > > > BPF_ATOMIC_FETCH_BOP(BOP, SIZE, DST, SRC, OFF) > > > > > can have less number of macros? > > > > > > > > Hmm yeah I think that's probably a good idea, it would be consistent > > > > with the macros for non-atomic ALU ops. > > > > > > > > I don't think 'BOP' would be very clear though, 'ALU' might be more > > > > obvious. > > > > > > BPF_ATOMIC_ALU and BPF_ATOMIC_FETCH_ALU indeed better. > > > > On second thoughts I think it feels right (i.e. it would be roughly > > consistent with the level of abstraction of the rest of this macro API) > > to go further and just have two macros BPF_ATOMIC64 and BPF_ATOMIC32: > > > > /* > > * Atomic ALU ops: > > * > > * BPF_ADD *(uint *) (dst_reg + off16) += src_reg > > * BPF_AND *(uint *) (dst_reg + off16) &= src_reg > > * BPF_OR *(uint *) (dst_reg + off16) |= src_reg > > * BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg > > "uint *" => "size_type *"? > and give an explanation that "size_type" is either "u32" or "u64"? "uint *" is already used in the file so I'll follow the precedent there. > > > * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg); > > * BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg); > > * BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg); > > * BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg); > > * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg) > > * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg) > > */ > > > > #define BPF_ATOMIC64(OP, DST, SRC, OFF) \ > > ((struct bpf_insn) { \ > > .code = BPF_STX | BPF_DW | BPF_ATOMIC, \ > > .dst_reg = DST, \ > > .src_reg = SRC, \ > > .off = OFF, \ > > .imm = OP }) > > > > #define BPF_ATOMIC32(OP, DST, SRC, OFF) \ > > ((struct bpf_insn) { \ > > .code = BPF_STX | BPF_W | BPF_ATOMIC, \ > > .dst_reg = DST, \ > > .src_reg = SRC, \ > > .off = OFF, \ > > .imm = OP }) > > You could have > BPF_ATOMIC(OP, SIZE, DST, SRC, OFF) > where SIZE is BPF_DW or BPF_W. Ah sorry, I didn't see this mail and have just posted v4 with the 2 separate macros. Let's see if anyone else has an opinion on this point. > > > > The downside compared to what's currently in the patchset is that the > > user can write e.g. BPF_ATOMIC64(BPF_SUB, BPF_REG_1, BPF_REG_2, 0) and > > it will compile. On the other hand they'll get a pretty clear > > "BPF_ATOMIC uses invalid atomic opcode 10" when they try to load the > > prog, and the valid atomic ops are clearly listed in Documentation as > > well as the comments here. > > This should be fine. As you mentioned, documentation has mentioned > what is supported and what is not...