On Wed, 2021-02-24 at 14:34 -0800, Martin KaFai Lau wrote: > On Wed, Feb 24, 2021 at 03:16:18PM +0100, Ilya Leoshkevich wrote: > > On Tue, 2021-02-23 at 15:08 +0000, Brendan Jackman wrote: > > > As pointed out by Ilya and explained in the new comment, there's a > > > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads > > > the value from memory into r0, while x86 only does so when r0 and > > > the > > > value in memory are different. The same issue affects s390. > > > > > > At first this might sound like pure semantics, but it makes a real > > > difference when the comparison is 32-bit, since the load will > > > zero-extend r0/rax. > > > > > > The fix is to explicitly zero-extend rax after doing such a > > > CMPXCHG. Since this problem affects multiple archs, this is done in > > > the verifier by patching in a BPF_ZEXT_REG instruction after every > > > 32-bit cmpxchg. Any archs that don't need such manual zero- > > > extension > > > can do a look-ahead with insn_is_zext to skip the unnecessary mov. > > > > > > There was actually already logic to patch in zero-extension insns > > > after 32-bit cmpxchgs, in opt_subreg_zext_lo32_rnd_hi32. To avoid > > > bloating the prog with unnecessary movs, we now explicitly check > > > and > > > skip that logic for this case. > > > > > > Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > > > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") > > > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> > > > --- > > > > > > Differences v3->v4[1]: > > > - Moved the optimization against pointless zext into the correct > > > place: > > > opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls. > > > > > > Differences v2->v3[1]: > > > - Moved patching into fixup_bpf_calls (patch incoming to rename > > > this > > > function) > > > - Added extra commentary on bpf_jit_needs_zext > > > - Added check to avoid adding a pointless zext(r0) if there's > > > already one there. > > > > > > Difference v1->v2[1]: Now solved centrally in the verifier instead > > > of > > > specifically for the x86 JIT. Thanks to Ilya and Daniel for the > > > suggestions! > > > > > > [1] v3: > > > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > > > v2: > > > https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@xxxxxxxxxxxxx/T/#t > > > v1: > > > https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@xxxxxxxxxxxxx/T/#t > > > > > > kernel/bpf/core.c | 4 +++ > > > kernel/bpf/verifier.c | 33 > > > +++++++++++++++++-- > > > .../selftests/bpf/verifier/atomic_cmpxchg.c | 25 ++++++++++++++ > > > .../selftests/bpf/verifier/atomic_or.c | 26 +++++++++++++++ > > > 4 files changed, 86 insertions(+), 2 deletions(-) > > > > I think I managed to figure out what is wrong with > > adjust_insn_aux_data(): insn_has_def32() does not know about > > BPF_FETCH. > > I'll post a fix shortly; in the meantime, based on my debugging > > experience and on looking at the code for a while, I have a few > > comments regarding the patch. > Ah. good catch. > > If adjust_insn_aux_data()/insn_has_def32() is fixed to set zext_dst > properly for BPF_FETCH, then that alone should be enough for s390? Yes, my fix [1] + this patch (with conflicts resolved) seem to work really nicely on s390 for me: no duplicate zexts and one less check that the JIT needs to do. [1] https://lore.kernel.org/bpf/20210224141837.104654-1-iii@xxxxxxxxxxxxx/