Re: [PATCH bpf-next] bpf: x86: Explicitly zero-extend rax after 32-bit cmpxchg

Ilya Leoshkevich <iii@xxxxxxxxxxxxx> · Mon, 15 Feb 2021 23:24:33 +0100

On Mon, 2021-02-15 at 23:20 +0100, Daniel Borkmann wrote:
> On 2/15/21 6:12 PM, Brendan Jackman wrote:
> > As pointed out by Ilya and explained in the new comment, there's a
> > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
> > the value from memory into r0, while x86 only does so when r0 and
> > the
> > value in memory are different.
> > 
> > At first this might sound like pure semantics, but it makes a real
> > difference when the comparison is 32-bit, since the load will
> > zero-extend r0/rax.
> > 
> > The fix is to explicitly zero-extend rax after doing such a
> > CMPXCHG.
> > 
> > Note that this doesn't generate totally optimal code: at one of
> > emit_atomic's callsites (where BPF_{AND,OR,XOR} | BPF_FETCH are
> > implemented), the new mov is superfluous because there's already a
> > mov generated afterwards that will zero-extend r0. We could avoid
> > this unnecessary mov by just moving the new logic outside of
> > emit_atomic. But I think it's simpler to keep emit_atomic as a unit
> > of correctness (it generates the correct x86 code for a certain set
> > of BPF instructions, no further knowledge is needed to use it
> > correctly).
> > 
> > Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
> > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
> > ---
> >   arch/x86/net/bpf_jit_comp.c                   | 10 +++++++
> >   .../selftests/bpf/verifier/atomic_cmpxchg.c   | 25
> > ++++++++++++++++++
> >   .../selftests/bpf/verifier/atomic_or.c        | 26
> > +++++++++++++++++++
> >   3 files changed, 61 insertions(+)
> > 
> > diff --git a/arch/x86/net/bpf_jit_comp.c
> > b/arch/x86/net/bpf_jit_comp.c
> > index 79e7a0ec1da5..7919d5c54164 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -834,6 +834,16 @@ static int emit_atomic(u8 **pprog, u8
> > atomic_op,
> >   
> >         emit_insn_suffix(&prog, dst_reg, src_reg, off);
> >   
> > +       if (atomic_op == BPF_CMPXCHG && bpf_size == BPF_W) {
> > +               /*
> > +                * BPF_CMPXCHG unconditionally loads into R0, which
> > means it
> > +                * zero-extends 32-bit values. However x86 CMPXCHG
> > doesn't do a
> > +                * load if the comparison is successful. Therefore
> > zero-extend
> > +                * explicitly.
> > +                */
> > +               emit_mov_reg(&prog, false, BPF_REG_0, BPF_REG_0);
> 
> How does the situation look on other archs when they need to
> implement this in future?
> Mainly asking whether it would be better to instead to move this
> logic into the verifier
> instead, so it'll be consistent across all archs.

I have exactly the same check in my s390 wip patch.
So having a common solution would be great.

[...]