On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote: > > Hi, > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the > doc, the interpreter and the X64 JIT do not agree on what the behavior > should be when the values match. > > If the operand size is BPF_W, this matters, because, depending on the > answer, the upper bits of R0 are either zeroed out out or are left > intact. > > I made the experiment based on the following modification to the > "atomic compare-and-exchange smoketest - 32bit" test on top of commit > ee5cc0363ea0: > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > @@ -57,8 +57,8 @@ > BPF_MOV32_IMM(BPF_REG_1, 4), > BPF_MOV32_IMM(BPF_REG_0, 3), > BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, > BPF_REG_1, -4), > - /* if (old != 3) exit(4); */ > - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > + /* if ((u64)old != 3) exit(4); */ > + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > BPF_MOV32_IMM(BPF_REG_0, 4), > BPF_EXIT_INSN(), > /* if (val != 4) exit(5); */ > > and got the following results: > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as if > there is a typo and either a period or the word "otherwise" is > missing? > > > If they match it is replaced with ``src_reg``, The value that was > > there before is loaded back to ``R0``. > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics) > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg semantics) > > => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > 0xffffffffc0146bcc: cmp $0x3,%rax > (gdb) p/x $rax > 0x6bd5720c00000003 > (gdb) x/d $rbp-4 > 0xffffc90001263d5c: 3 > > 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > => 0xffffffffc0146bcc: cmp $0x3,%rax > (gdb) p/x $rax > 0x6bd5720c00000003 > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?) > > => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > 0xffffffffc0144201: cmp $0x3,%rax > (gdb) p/x $rax > 0x81776ea600000003 > (gdb) x/d $rbp-4 > 0xffffc90001117d5c: 3 > > 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > => 0xffffffffc0144201: cmp $0x3,%rax > (gdb) p/x $rax > $3 = 0x3 > > So which option is the right one? In my opinion, it would be safer to > follow what the interpreter does and zero out the upper bits. Wow. What a find! I thought that all 32-bit x86 ops zero-extend the dest register. I agree that it's best to zero upper bits for cmpxchg as well. I wonder whether compilers know about this exceptional behavior. I believe the bpf backend considers full R0 to be used by bpf's cmpxchg. Do you know what xchg does on x86? What about arm64 with cas?