On Mon, Feb 15, 2021 at 6:12 PM Brendan Jackman <jackmanb@xxxxxxxxxx> wrote: > > As pointed out by Ilya and explained in the new comment, there's a > discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads > the value from memory into r0, while x86 only does so when r0 and the > value in memory are different. > > At first this might sound like pure semantics, but it makes a real > difference when the comparison is 32-bit, since the load will > zero-extend r0/rax. > > The fix is to explicitly zero-extend rax after doing such a CMPXCHG. > > Note that this doesn't generate totally optimal code: at one of > emit_atomic's callsites (where BPF_{AND,OR,XOR} | BPF_FETCH are I think this should be okay and was also suggested by Alexei in: https://lore.kernel.org/bpf/CAADnVQ+gnQED7WYAw7Vmm5=omngCKYXnmgU_NqPUfESBerH8gQ@xxxxxxxxxxxxxx/ > implemented), the new mov is superfluous because there's already a > mov generated afterwards that will zero-extend r0. We could avoid > this unnecessary mov by just moving the new logic outside of > emit_atomic. But I think it's simpler to keep emit_atomic as a unit > of correctness (it generates the correct x86 code for a certain set > of BPF instructions, no further knowledge is needed to use it > correctly). > > Reported-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") > Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> Thanks for fixing this! Acked-by: KP Singh <kpsingh@xxxxxxxxxx>