> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >> > On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: >> > > >> > > BPF has two documented (non-atomic) memory store instructions: >> > > >> > > BPF_STX: *(size *) (dst_reg + off) = src_reg >> > > BPF_ST : *(size *) (dst_reg + off) = imm32 >> > > >> > > Currently LLVM BPF back-end does not emit BPF_ST instruction and does >> > > not allow one to be specified as inline assembly. >> > > >> > > Recently I've been exploring ways to port some of the verifier test >> > > cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >> > > and machinery provided in tools/testing/selftests/bpf/test_loader.c >> > > (which should hopefully simplify tests maintenance). >> > > The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >> > > >> > > While it is possible to adjust LLVM to only support BPF_ST for inline >> > > assembly blocks it seems a bit wasteful. This patch-set contains a set >> > > of changes to verifier necessary in case when LLVM is allowed to >> > > freely emit BPF_ST instructions (source code is available here [1]). >> > >> > Would we gate LLVM's emitting of BPF_ST for C code behind some new >> > cpu=v4? What is the benefit for compiler to start automatically emit >> > such instructions? Such thinking about logistics, if there isn't much >> > benefit, as BPF application owner I wouldn't bother enabling this >> > behavior risking regressions on old kernels that don't have these >> > changes. >> >> Hmm, GCC happily generates BPF_ST instructions: >> >> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >> $ cat foo.s >> .file "<stdin>" >> .text >> .align 3 >> .global foo >> .type foo, @function >> foo: >> lddw %r0,v >> stw [%r0+0],666 >> exit >> .size foo, .-foo >> .global v >> .type v, @object >> .lcomm v,4,4 >> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >> >> Been doing that since October 2019, I think before the cpu versioning >> mechanism was got in place? >> >> We weren't aware this was problematic. Does the verifier reject such >> instructions? > > Interesting, do BPF selftests generated by GCC pass the same way they > do if generated by clang? We are still working on other issues in GCC; we are not yet in par with clang when it comes to build/run the BPF selftests. So I guess we didn't reach that particular breaking point yet ;) > I had to do the following changes to the verifier to make the > selftests pass when BPF_ST instruction is allowed for selection: So, if the current state of affairs is that the kernel rejects BPF_ST instructions, we shouldn't be generating them from GCC. I like Andrii's idea of introducing a -mcpu=v4 and condition generation of BPF_ST from the compiler to it. GCC is defaulting to -mcpu=v3 atm. Yonghong, WDYT? Would that be acceptable for clang as well? > > - patch #1 in this patchset: track values of constants written to > stack using BPF_ST. Currently these are tracked imprecisely, unlike > the writes using BPF_STX, e.g.: > > fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm > after such instruction, where m stands for "misc", > just a note that something is written at fp[-8]. > > r1 = 42; verifier tracks r1=42 after this instruction. > fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. > > So the patch makes both cases equivalent. > > - patch #3 in this patchset: adjusts verifier.c:convert_ctx_access() > to operate on BPF_ST alongside BPF_STX. > > Context parameters for some BPF programs types are "fake" data > structures. The verifier matches all BPF_STX and BPF_LDX > instructions that operate on pointers to such contexts and rewrites > these instructions. It might change an offset or add another layer > of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). > (This also implies that verifier forbids writes to non-constant > offsets inside such structures). > > So the patch extends this logic to also handle BPF_ST. > >> >> > So I feel like the biggest benefit is to be able to use this >> > instruction in embedded assembly, to make writing and maintaining >> > tests easier. >> > >> > > The changes include: >> > > - update to verifier.c:check_stack_write_*() functions to track >> > > constant values spilled to stack via BPF_ST instruction in a same >> > > way stack spills of known registers by BPF_STX are tracked; >> > > - updates to verifier.c:convert_ctx_access() and it's callbacks to >> > > handle BPF_ST instruction in a way similar to BPF_STX; >> > > - some test adjustments and a few new tests. >> > > >> > > With the above changes applied and LLVM from [1] all test_verifier, >> > > test_maps, test_progs and test_progs-no_alu32 test cases are passing. >> > > >> > > When built using the LLVM version from [1] BPF programs generated for >> > > selftests and Cilium programs (taken from [2]) see certain reduction >> > > in size, e.g. below are total numbers of instructions for files with >> > > over 5K instructions: >> > > >> > > File Insns Insns Insns Diff >> > > w/o with diff pct >> > > BPF_ST BPF_ST >> > > cilium/bpf_host.o 44620 43774 -846 -1.90% >> > > cilium/bpf_lxc.o 36842 36060 -782 -2.12% >> > > cilium/bpf_overlay.o 23557 23186 -371 -1.57% >> > > cilium/bpf_xdp.o 26397 25931 -466 -1.77% >> > > selftests/core_kern.bpf.o 12359 12359 0 0.00% >> > > selftests/linked_list_fail.bpf.o 5501 5302 -199 -3.62% >> > > selftests/profiler1.bpf.o 17828 17709 -119 -0.67% >> > > selftests/pyperf100.bpf.o 49793 49268 -525 -1.05% >> > > selftests/pyperf180.bpf.o 88738 87813 -925 -1.04% >> > > selftests/pyperf50.bpf.o 25388 25113 -275 -1.08% >> > > selftests/pyperf600.bpf.o 78330 78300 -30 -0.04% >> > > selftests/pyperf_global.bpf.o 5244 5188 -56 -1.07% >> > > selftests/pyperf_subprogs.bpf.o 5262 5192 -70 -1.33% >> > > selftests/strobemeta.bpf.o 17154 16065 -1089 -6.35% >> > > selftests/test_verif_scale2.bpf.o 11337 11337 0 0.00% >> > > >> > > (Instructions are counted by counting the number of instruction lines >> > > in disassembly). >> > > >> > > Is community interested in this work? >> > > Are there any omissions in my changes to the verifier? >> > > >> > > Known issue: >> > > >> > > There are two programs (one Cilium, one selftest) that exhibit >> > > anomalous increase in verifier processing time with this patch-set: >> > > >> > > File Program Insns (A) Insns (B) Insns (DIFF) >> > > ------------------- ----------------------------- --------- --------- -------------- >> > > bpf_host.o tail_ipv6_host_policy_ingress 1576 2403 +827 (+52.47%) >> > > map_kptr.bpf.o test_map_kptr 400 475 +75 (+18.75%) >> > > ------------------- ----------------------------- --------- --------- -------------- >> > > >> > > These are under investigation. >> > > >> > > Thanks, >> > > Eduard >> > > >> > > [1] https://reviews.llvm.org/D140804 >> > > [2] git@xxxxxxxxxx:anakryiko/cilium.git >> > > >> > > Eduard Zingerman (5): >> > > bpf: more precise stack write reasoning for BPF_ST instruction >> > > selftests/bpf: check if verifier tracks constants spilled by >> > > BPF_ST_MEM >> > > bpf: allow ctx writes using BPF_ST_MEM instruction >> > > selftests/bpf: test if pointer type is tracked for BPF_ST_MEM >> > > selftests/bpf: don't match exact insn index in expected error message >> > > >> > > kernel/bpf/cgroup.c | 49 +++++--- >> > > kernel/bpf/verifier.c | 102 +++++++++------- >> > > net/core/filter.c | 72 ++++++------ >> > > .../selftests/bpf/prog_tests/log_fixup.c | 2 +- >> > > .../selftests/bpf/prog_tests/spin_lock.c | 8 +- >> > > .../bpf/verifier/bounds_mix_sign_unsign.c | 110 ++++++++++-------- >> > > .../selftests/bpf/verifier/bpf_st_mem.c | 29 +++++ >> > > tools/testing/selftests/bpf/verifier/ctx.c | 11 -- >> > > tools/testing/selftests/bpf/verifier/unpriv.c | 23 ++++ >> > > 9 files changed, 249 insertions(+), 157 deletions(-) >> > > create mode 100644 tools/testing/selftests/bpf/verifier/bpf_st_mem.c >> > > >> > > -- >> > > 2.39.0 >> > >