Re: [PATCH bpf-next v3 00/14] Atomics for eBPF

Yonghong Song <yhs@xxxxxx> · Thu, 3 Dec 2020 20:46:19 -0800

On 12/3/20 8:02 AM, Brendan Jackman wrote:
Status of the patches
=====================

Thanks for the reviews! Differences from v2->v3 [1]:

* More minor fixes and naming/comment changes

* Dropped atomic subtract: compilers can implement this by preceding
   an atomic add with a NEG instruction (which is what the x86 JIT did
   under the hood anyway).

* Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is
   no longer an architecture version bump. Instead a feature test is
   added to Kbuild - it builds a source file to check if Clang
   supports BPF atomics.

* Fixed the prog_test so it no longer breaks
   test_progs-no_alu32. This requires some ifdef acrobatics to avoid
   complicating the prog_tests model where the same userspace code
   exercises both the normal and no_alu32 BPF test objects, using the
   same skeleton header.

Differences from v1->v2 [1]:

* Fixed mistakes in the netronome driver

* Addd sub, add, or, xor operations

* The above led to some refactors to keep things readable. (Maybe I
   should have just waited until I'd implemented these before starting
   the review...)

* Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
   include the BPF_FETCH flag

* Added a bit of documentation. Suggestions welcome for more places
   to dump this info...

The prog_test that's added depends on Clang/LLVM features added by
Yonghong in https://reviews.llvm.org/D72184

Just let you know that the above patch has been merged into llvm-project
trunk, so you do not manually apply it any more.

This only includes a JIT implementation for x86_64 - I don't plan to
implement JIT support myself for other architectures.

Operations
==========

This patchset adds atomic operations to the eBPF instruction set. The
use-case that motivated this work was a trivial and efficient way to
generate globally-unique cookies in BPF progs, but I think it's
obvious that these features are pretty widely applicable.  The
instructions that are added here can be summarised with this list of
kernel operations:

* atomic[64]_[fetch_]add
* atomic[64]_[fetch_]and
* atomic[64]_[fetch_]or
* atomic[64]_xchg
* atomic[64]_cmpxchg

The following are left out of scope for this effort:

* 16 and 8 bit operations
* Explicit memory barriers

Encoding
========

I originally planned to add new values for bpf_insn.opcode. This was
rather unpleasant: the opcode space has holes in it but no entire
instruction classes[2]. Yonghong Song had a better idea: use the
immediate field of the existing STX XADD instruction to encode the
operation. This works nicely, without breaking existing programs,
because the immediate field is currently reserved-must-be-zero, and
extra-nicely because BPF_ADD happens to be zero.

Note that this of course makes immediate-source atomic operations
impossible. It's hard to imagine a measurable speedup from such
instructions, and if it existed it would certainly not benefit x86,
which has no support for them.

The BPF_OP opcode fields are re-used in the immediate, and an
additional flag BPF_FETCH is used to mark instructions that should
fetch a pre-modification value from memory.

So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid
breaking userspace builds), and where we previously had .imm = 0, we
now have .imm = BPF_ADD (which is 0).

Operands
========

Reg-source eBPF instructions only have two operands, while these
atomic operations have up to four. To avoid needing to encode
additional operands, then:

- One of the input registers is re-used as an output register
   (e.g. atomic_fetch_add both reads from and writes to the source
   register).

- Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of
   the operands.

This approach also allows the new eBPF instructions to map directly
to single x86 instructions.

[1] Previous patchset:
     https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@xxxxxxxxxx/

[2] Visualisation of eBPF opcode space:
     https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778

[...]