Thanks once again for the reviews. On Wed, 3 Mar 2021 at 08:29, Yonghong Song <yhs@xxxxxx> wrote: > > > > On 3/2/21 10:43 AM, Martin KaFai Lau wrote: > > On Tue, Mar 02, 2021 at 10:54:00AM +0000, Brendan Jackman wrote: > >> As pointed out by Ilya and explained in the new comment, there's a > >> discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads > >> the value from memory into r0, while x86 only does so when r0 and the > >> value in memory are different. The same issue affects s390. > >> > >> At first this might sound like pure semantics, but it makes a real > >> difference when the comparison is 32-bit, since the load will > >> zero-extend r0/rax. > >> > >> The fix is to explicitly zero-extend rax after doing such a > >> CMPXCHG. Since this problem affects multiple archs, this is done in > >> the verifier by patching in a BPF_ZEXT_REG instruction after every > >> 32-bit cmpxchg. Any archs that don't need such manual zero-extension > >> can do a look-ahead with insn_is_zext to skip the unnecessary mov. > >> > >> Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > >> Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") > >> Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> > >> --- > >> > >> > >> Differences v4->v5[1]: > >> - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin > >> for suggesting this. > >> > >> Differences v3->v4[1]: > >> - Moved the optimization against pointless zext into the correct place: > >> opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls. > >> > >> Differences v2->v3[1]: > >> - Moved patching into fixup_bpf_calls (patch incoming to rename this function) > >> - Added extra commentary on bpf_jit_needs_zext > >> - Added check to avoid adding a pointless zext(r0) if there's already one there. > >> > >> Difference v1->v2[1]: Now solved centrally in the verifier instead of > >> specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions! > >> > >> [1] v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@xxxxxxxxxxxxxx/T/#t > >> v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > >> v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > >> v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@xxxxxxxxxxxxx/T/#t > >> > >> > >> kernel/bpf/core.c | 4 +++ > >> kernel/bpf/verifier.c | 17 +++++++++++- > >> .../selftests/bpf/verifier/atomic_cmpxchg.c | 25 ++++++++++++++++++ > >> .../selftests/bpf/verifier/atomic_or.c | 26 +++++++++++++++++++ > >> 4 files changed, 71 insertions(+), 1 deletion(-) > >> > >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > >> index 0ae015ad1e05..dcf18612841b 100644 > >> --- a/kernel/bpf/core.c > >> +++ b/kernel/bpf/core.c > >> @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func) > >> /* Return TRUE if the JIT backend wants verifier to enable sub-register usage > >> * analysis code and wants explicit zero extension inserted by verifier. > >> * Otherwise, return FALSE. > >> + * > >> + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if > >> + * you don't override this. JITs that don't want these extra insns can detect > >> + * them using insn_is_zext. > >> */ > >> bool __weak bpf_jit_needs_zext(void) > >> { > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 4c373589273b..37076e4c6175 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -11237,6 +11237,11 @@ static int opt_remove_nops(struct bpf_verifier_env *env) > >> return 0; > >> } > >> > >> +static inline bool is_cmpxchg(struct bpf_insn *insn) > > nit. "const" struct bpf_insn *insn. > > > >> +{ > >> + return (BPF_MODE(insn->code) == BPF_ATOMIC && insn->imm == BPF_CMPXCHG); > > I think it is better to check BPF_CLASS(insn->code) == BPF_STX also > > in case in the future this helper will be reused before do_check() > > has a chance to verify the instructions. > > If this is the case, I would suggest to move is_cmpxchg() earlier > in verifier.c so later on for reuse we do not need to move this > function. Also, in the return statement, there is no need > for outmost (). Yep, all good points, thanks. Spinning v6 now. > > > >> +} > >> + > >> static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, > >> const union bpf_attr *attr) > >> { > >> @@ -11296,7 +11301,17 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, > >> goto apply_patch_buffer; > >> } > >> > >> - if (!bpf_jit_needs_zext()) > >> + /* Add in an zero-extend instruction if a) the JIT has requested > >> + * it or b) it's a CMPXCHG. > >> + * > >> + * The latter is because: BPF_CMPXCHG always loads a value into > >> + * R0, therefore always zero-extends. However some archs' > >> + * equivalent instruction only does this load when the > >> + * comparison is successful. This detail of CMPXCHG is > >> + * orthogonal to the general zero-extension behaviour of the > >> + * CPU, so it's treated independently of bpf_jit_needs_zext. > >> + */ > >> + if (!bpf_jit_needs_zext() && !is_cmpxchg(&insn)) > >> continue; > >> > >> if (WARN_ON_ONCE(load_reg == -1)) { > >> diff --git a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > >> index 2efd8bcf57a1..6e52dfc64415 100644 > >> --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > >> +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > >> @@ -94,3 +94,28 @@ > >> .result = REJECT, > >> .errstr = "invalid read from stack", > >> }, > >> +{ > >> + "BPF_W cmpxchg should zero top 32 bits", > >> + .insns = { > >> + /* r0 = U64_MAX; */ > >> + BPF_MOV64_IMM(BPF_REG_0, 0), > >> + BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1), > >> + /* u64 val = r0; */ > >> + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > >> + /* r0 = (u32)atomic_cmpxchg((u32 *)&val, r0, 1); */ > >> + BPF_MOV32_IMM(BPF_REG_1, 1), > >> + BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, BPF_REG_1, -8), > >> + /* r1 = 0x00000000FFFFFFFFull; */ > >> + BPF_MOV64_IMM(BPF_REG_1, 1), > >> + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32), > >> + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), > >> + /* if (r0 != r1) exit(1); */ > >> + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_1, 2), > >> + BPF_MOV32_IMM(BPF_REG_0, 1), > >> + BPF_EXIT_INSN(), > >> + /* exit(0); */ > >> + BPF_MOV32_IMM(BPF_REG_0, 0), > >> + BPF_EXIT_INSN(), > >> + }, > >> + .result = ACCEPT, > >> +}, > >> diff --git a/tools/testing/selftests/bpf/verifier/atomic_or.c b/tools/testing/selftests/bpf/verifier/atomic_or.c > >> index 70f982e1f9f0..0a08b99e6ddd 100644 > >> --- a/tools/testing/selftests/bpf/verifier/atomic_or.c > >> +++ b/tools/testing/selftests/bpf/verifier/atomic_or.c > >> @@ -75,3 +75,29 @@ > >> }, > >> .result = ACCEPT, > >> }, > >> +{ > >> + "BPF_W atomic_fetch_or should zero top 32 bits", > >> + .insns = { > >> + /* r1 = U64_MAX; */ > >> + BPF_MOV64_IMM(BPF_REG_1, 0), > >> + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), > >> + /* u64 val = r0; */ > > s/r0/r1/ > > > >> + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8), > >> + /* r1 = (u32)atomic_sub((u32 *)&val, 1); */ > > r1 = (u32)atomic_fetch_or((u32 *)&val, 2) > > > >> + BPF_MOV32_IMM(BPF_REG_1, 2), > >> + BPF_ATOMIC_OP(BPF_W, BPF_OR | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > >> + /* r2 = 0x00000000FFFFFFFF; */ > >> + BPF_MOV64_IMM(BPF_REG_2, 1), > >> + BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32), > >> + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 1), > >> + /* if (r2 != r1) exit(1); */ > >> + BPF_JMP_REG(BPF_JEQ, BPF_REG_2, BPF_REG_1, 2), > >> + /* BPF_MOV32_IMM(BPF_REG_0, 1), */ > >> + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), > >> + BPF_EXIT_INSN(), > >> + /* exit(0); */ > >> + BPF_MOV32_IMM(BPF_REG_0, 0), > >> + BPF_EXIT_INSN(), > >> + }, > >> + .result = ACCEPT, > >> +}, > >> > >> base-commit: f2cfe32e8a965a86e512dcb2e6251371d4a60c63 > >> -- > >> 2.30.1.766.gb4fecdf3b7-goog > >>