On Tue, Mar 02, 2021 at 10:54:00AM +0000, Brendan Jackman wrote: > As pointed out by Ilya and explained in the new comment, there's a > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads > the value from memory into r0, while x86 only does so when r0 and the > value in memory are different. The same issue affects s390. > > At first this might sound like pure semantics, but it makes a real > difference when the comparison is 32-bit, since the load will > zero-extend r0/rax. > > The fix is to explicitly zero-extend rax after doing such a > CMPXCHG. Since this problem affects multiple archs, this is done in > the verifier by patching in a BPF_ZEXT_REG instruction after every > 32-bit cmpxchg. Any archs that don't need such manual zero-extension > can do a look-ahead with insn_is_zext to skip the unnecessary mov. > > Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> > --- > > > Differences v4->v5[1]: > - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin > for suggesting this. > > Differences v3->v4[1]: > - Moved the optimization against pointless zext into the correct place: > opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls. > > Differences v2->v3[1]: > - Moved patching into fixup_bpf_calls (patch incoming to rename this function) > - Added extra commentary on bpf_jit_needs_zext > - Added check to avoid adding a pointless zext(r0) if there's already one there. > > Difference v1->v2[1]: Now solved centrally in the verifier instead of > specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions! > > [1] v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@xxxxxxxxxxxxxx/T/#t > v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@xxxxxxxxxxxxx/T/#t > > > kernel/bpf/core.c | 4 +++ > kernel/bpf/verifier.c | 17 +++++++++++- > .../selftests/bpf/verifier/atomic_cmpxchg.c | 25 ++++++++++++++++++ > .../selftests/bpf/verifier/atomic_or.c | 26 +++++++++++++++++++ > 4 files changed, 71 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 0ae015ad1e05..dcf18612841b 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2342,6 +2342,10 @@ bool __weak bpf_helper_changes_pkt_data(void *func) > /* Return TRUE if the JIT backend wants verifier to enable sub-register usage > * analysis code and wants explicit zero extension inserted by verifier. > * Otherwise, return FALSE. > + * > + * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if > + * you don't override this. JITs that don't want these extra insns can detect > + * them using insn_is_zext. > */ > bool __weak bpf_jit_needs_zext(void) > { > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 4c373589273b..37076e4c6175 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -11237,6 +11237,11 @@ static int opt_remove_nops(struct bpf_verifier_env *env) > return 0; > } > > +static inline bool is_cmpxchg(struct bpf_insn *insn) nit. "const" struct bpf_insn *insn. > +{ > + return (BPF_MODE(insn->code) == BPF_ATOMIC && insn->imm == BPF_CMPXCHG); I think it is better to check BPF_CLASS(insn->code) == BPF_STX also in case in the future this helper will be reused before do_check() has a chance to verify the instructions. > +} > + > static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, > const union bpf_attr *attr) > { > @@ -11296,7 +11301,17 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, > goto apply_patch_buffer; > } > > - if (!bpf_jit_needs_zext()) > + /* Add in an zero-extend instruction if a) the JIT has requested > + * it or b) it's a CMPXCHG. > + * > + * The latter is because: BPF_CMPXCHG always loads a value into > + * R0, therefore always zero-extends. However some archs' > + * equivalent instruction only does this load when the > + * comparison is successful. This detail of CMPXCHG is > + * orthogonal to the general zero-extension behaviour of the > + * CPU, so it's treated independently of bpf_jit_needs_zext. > + */ > + if (!bpf_jit_needs_zext() && !is_cmpxchg(&insn)) > continue; > > if (WARN_ON_ONCE(load_reg == -1)) { > diff --git a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > index 2efd8bcf57a1..6e52dfc64415 100644 > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > @@ -94,3 +94,28 @@ > .result = REJECT, > .errstr = "invalid read from stack", > }, > +{ > + "BPF_W cmpxchg should zero top 32 bits", > + .insns = { > + /* r0 = U64_MAX; */ > + BPF_MOV64_IMM(BPF_REG_0, 0), > + BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1), > + /* u64 val = r0; */ > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > + /* r0 = (u32)atomic_cmpxchg((u32 *)&val, r0, 1); */ > + BPF_MOV32_IMM(BPF_REG_1, 1), > + BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, BPF_REG_1, -8), > + /* r1 = 0x00000000FFFFFFFFull; */ > + BPF_MOV64_IMM(BPF_REG_1, 1), > + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32), > + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), > + /* if (r0 != r1) exit(1); */ > + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_1, 2), > + BPF_MOV32_IMM(BPF_REG_0, 1), > + BPF_EXIT_INSN(), > + /* exit(0); */ > + BPF_MOV32_IMM(BPF_REG_0, 0), > + BPF_EXIT_INSN(), > + }, > + .result = ACCEPT, > +}, > diff --git a/tools/testing/selftests/bpf/verifier/atomic_or.c b/tools/testing/selftests/bpf/verifier/atomic_or.c > index 70f982e1f9f0..0a08b99e6ddd 100644 > --- a/tools/testing/selftests/bpf/verifier/atomic_or.c > +++ b/tools/testing/selftests/bpf/verifier/atomic_or.c > @@ -75,3 +75,29 @@ > }, > .result = ACCEPT, > }, > +{ > + "BPF_W atomic_fetch_or should zero top 32 bits", > + .insns = { > + /* r1 = U64_MAX; */ > + BPF_MOV64_IMM(BPF_REG_1, 0), > + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), > + /* u64 val = r0; */ s/r0/r1/ > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_1, -8), > + /* r1 = (u32)atomic_sub((u32 *)&val, 1); */ r1 = (u32)atomic_fetch_or((u32 *)&val, 2) > + BPF_MOV32_IMM(BPF_REG_1, 2), > + BPF_ATOMIC_OP(BPF_W, BPF_OR | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > + /* r2 = 0x00000000FFFFFFFF; */ > + BPF_MOV64_IMM(BPF_REG_2, 1), > + BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 32), > + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 1), > + /* if (r2 != r1) exit(1); */ > + BPF_JMP_REG(BPF_JEQ, BPF_REG_2, BPF_REG_1, 2), > + /* BPF_MOV32_IMM(BPF_REG_0, 1), */ > + BPF_MOV64_REG(BPF_REG_0, BPF_REG_1), > + BPF_EXIT_INSN(), > + /* exit(0); */ > + BPF_MOV32_IMM(BPF_REG_0, 0), > + BPF_EXIT_INSN(), > + }, > + .result = ACCEPT, > +}, > > base-commit: f2cfe32e8a965a86e512dcb2e6251371d4a60c63 > -- > 2.30.1.766.gb4fecdf3b7-goog >