On Mon, 2021-02-15 at 23:35 +0100, Daniel Borkmann wrote: > On 2/15/21 11:24 PM, Ilya Leoshkevich wrote: > > On Mon, 2021-02-15 at 23:20 +0100, Daniel Borkmann wrote: > > > On 2/15/21 6:12 PM, Brendan Jackman wrote: > > > > As pointed out by Ilya and explained in the new comment, > > > > there's a > > > > discrepancy between x86 and BPF CMPXCHG semantics: BPF always > > > > loads > > > > the value from memory into r0, while x86 only does so when r0 > > > > and > > > > the > > > > value in memory are different. > > > > > > > > At first this might sound like pure semantics, but it makes a > > > > real > > > > difference when the comparison is 32-bit, since the load will > > > > zero-extend r0/rax. > > > > > > > > The fix is to explicitly zero-extend rax after doing such a > > > > CMPXCHG. > > > > > > > > Note that this doesn't generate totally optimal code: at one of > > > > emit_atomic's callsites (where BPF_{AND,OR,XOR} | BPF_FETCH are > > > > implemented), the new mov is superfluous because there's > > > > already a > > > > mov generated afterwards that will zero-extend r0. We could > > > > avoid > > > > this unnecessary mov by just moving the new logic outside of > > > > emit_atomic. But I think it's simpler to keep emit_atomic as a > > > > unit > > > > of correctness (it generates the correct x86 code for a certain > > > > set > > > > of BPF instructions, no further knowledge is needed to use it > > > > correctly). > > > > > > > > Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > > > > Fixes: 5ffa25502b5a ("bpf: Add instructions for > > > > atomic_[cmp]xchg") > > > > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> > > > > --- > > > > arch/x86/net/bpf_jit_comp.c | 10 +++++++ > > > > .../selftests/bpf/verifier/atomic_cmpxchg.c | 25 > > > > ++++++++++++++++++ > > > > .../selftests/bpf/verifier/atomic_or.c | 26 > > > > +++++++++++++++++++ > > > > 3 files changed, 61 insertions(+) > > > > > > > > diff --git a/arch/x86/net/bpf_jit_comp.c > > > > b/arch/x86/net/bpf_jit_comp.c > > > > index 79e7a0ec1da5..7919d5c54164 100644 > > > > --- a/arch/x86/net/bpf_jit_comp.c > > > > +++ b/arch/x86/net/bpf_jit_comp.c > > > > @@ -834,6 +834,16 @@ static int emit_atomic(u8 **pprog, u8 > > > > atomic_op, > > > > > > > > emit_insn_suffix(&prog, dst_reg, src_reg, off); > > > > > > > > + if (atomic_op == BPF_CMPXCHG && bpf_size == BPF_W) { > > > > + /* > > > > + * BPF_CMPXCHG unconditionally loads into R0, > > > > which > > > > means it > > > > + * zero-extends 32-bit values. However x86 > > > > CMPXCHG > > > > doesn't do a > > > > + * load if the comparison is successful. > > > > Therefore > > > > zero-extend > > > > + * explicitly. > > > > + */ > > > > + emit_mov_reg(&prog, false, BPF_REG_0, > > > > BPF_REG_0); > > > > > > How does the situation look on other archs when they need to > > > implement this in future? > > > Mainly asking whether it would be better to instead to move this > > > logic into the verifier > > > instead, so it'll be consistent across all archs. > > > > I have exactly the same check in my s390 wip patch. > > So having a common solution would be great. > > We do rewrites for various cases like div/mod handling, perhaps would > be > best to emit an explicit BPF_MOV32_REG(insn->dst_reg, insn->dst_reg) > there, > see the fixup_bpf_calls(). How about BPF_ZEXT_REG? Then arches that don't need this (I think aarch64's instruction always zero-extends) can detect this using insn_is_zext() and skip such insns.