Re: What should BPF_CMPXCHG do when the values match?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2021-02-09 at 20:14 -0800, Alexei Starovoitov wrote:
> On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> wrote:
> > 
> > Hi,
> > 
> > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the
> > doc, the interpreter and the X64 JIT do not agree on what the
> > behavior
> > should be when the values match.
> > 
> > If the operand size is BPF_W, this matters, because, depending on
> > the
> > answer, the upper bits of R0 are either zeroed out out or are left
> > intact.
> > 
> > I made the experiment based on the following modification to the
> > "atomic compare-and-exchange smoketest - 32bit" test on top of
> > commit
> > ee5cc0363ea0:
> > 
> > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > @@ -57,8 +57,8 @@
> >                 BPF_MOV32_IMM(BPF_REG_1, 4),
> >                 BPF_MOV32_IMM(BPF_REG_0, 3),
> >                 BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10,
> > BPF_REG_1, -4),
> > -               /* if (old != 3) exit(4); */
> > -               BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> > +               /* if ((u64)old != 3) exit(4); */
> > +               BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
> >                 BPF_MOV32_IMM(BPF_REG_0, 4),
> >                 BPF_EXIT_INSN(),
> >                 /* if (val != 4) exit(5); */
> > 
> > and got the following results:
> > 
> > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as
> > if
> >    there is a typo and either a period or the word "otherwise" is
> >    missing?
> > 
> >    > If they match it is replaced with ``src_reg``, The value that
> > was
> >    > there before is loaded back to ``R0``.
> > 
> > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics)
> > 
> > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg
> > semantics)
> > 
> >    => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> >       0xffffffffc0146bcc: cmp $0x3,%rax
> >    (gdb) p/x $rax
> >    0x6bd5720c00000003
> >    (gdb) x/d $rbp-4
> >    0xffffc90001263d5c: 3
> > 
> >       0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp)
> >    => 0xffffffffc0146bcc: cmp $0x3,%rax
> >    (gdb) p/x $rax
> >    0x6bd5720c00000003
> > 
> > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?)
> > 
> >    => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> >       0xffffffffc0144201: cmp $0x3,%rax
> >    (gdb) p/x $rax
> >    0x81776ea600000003
> >    (gdb) x/d $rbp-4
> >    0xffffc90001117d5c: 3
> > 
> >       0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp)
> >    => 0xffffffffc0144201: cmp $0x3,%rax
> >    (gdb) p/x $rax
> >    $3 = 0x3
> > 
> > So which option is the right one? In my opinion, it would be safer
> > to
> > follow what the interpreter does and zero out the upper bits.
> 
> Wow. What a find!
> I thought that all 32-bit x86 ops zero-extend the dest register.

I think that's true, it's just that when the values match, cmpxchg is
specified to do nothing.

> I agree that it's best to zero upper bits for cmpxchg as well.

I will send a doc patch to clarify the wording then.

> I wonder whether compilers know about this exceptional behavior.

I'm not too familiar with the BPF LLVM backend, but at least CMPXCHG32
is defined in a similar way to XFALU32, so it should be fine. Maybe
Yonghong can comment on this further.

> I believe the bpf backend considers full R0 to be used by bpf's
> cmpxchg.

It's a little bit inconsistent at the moment. I don't know why yet,
but on s390 the subreg optimization kicks in and I have to run with the
following patch in order to avoid stack pointer zero extension:

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10588,6 +10588,7 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct
bpf_verifier_env *env,
        for (i = 0; i < len; i++) {
                int adj_idx = i + delta;
                struct bpf_insn insn;
+               u8 load_reg;
 
                insn = insns[adj_idx];
                if (!aux[adj_idx].zext_dst) {
@@ -10630,9 +10631,29 @@ static int
opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
                if (!bpf_jit_needs_zext())
                        continue;
 
+               /* zext_dst means that we want to zero-extend whatever
register
+                * the insn defines, which is dst_reg most of the time,
with
+                * the notable exception of BPF_STX + BPF_ATOMIC +
BPF_FETCH.
+                */
+               if (BPF_CLASS(insn.code) == BPF_STX &&
+                   BPF_MODE(insn.code) == BPF_ATOMIC) {
+                       /* BPF_STX + BPF_ATOMIC insns without BPF_FETCH
do not
+                        * define any registers, therefore zext_dst
cannot be
+                        * set.
+                        */
+                       if (WARN_ON_ONCE(!(insn.imm & BPF_FETCH)))
+                               return -EINVAL;
+                       if (insn.imm == BPF_CMPXCHG)
+                               load_reg = BPF_REG_0;
+                       else
+                               load_reg = insn.src_reg;
+               } else {
+                       load_reg = insn.dst_reg;
+               }
+
                zext_patch[0] = insn;
-               zext_patch[1].dst_reg = insn.dst_reg;
-               zext_patch[1].src_reg = insn.dst_reg;
+               zext_patch[1].dst_reg = load_reg;
+               zext_patch[1].src_reg = load_reg;
                patch = zext_patch;
                patch_len = 2;
 apply_patch_buffer:

However, this doesn't seem to affect x86_64.

> Do you know what xchg does on x86? What about arm64 with cas?

xchg always zeroes out the upper half.
Unlike x86_64's cmpxchg, arm64's cas is specified to always zero out
the upper half, even if the values match. I don't have access to arm8.1
machine to test this, but at least QEMU does behave this way.
s390's cs does not zero out the upper half, we need to use llgfr in
addition (which doesn't sound like a big deal to me).




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux