On Fri, Dec 22, 2023 at 2:59 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Dec 20, 2023 at 8:28 PM Kumar Kartikeya Dwivedi > <memxor@xxxxxxxxx> wrote: > > > > When I was writing the _eq/lt/... variants I noticed that unless I use > > LHS to be a full 8-byte register the compiler can still play > > shenanigans even with inline assembly, like choosing a different input > > register for LHS than the one already allocated for a variable before > > the assembly is emitted, doing left/right shifts to mask upper bits > > before the inline assembly logic, and thus since the scalar id > > association is broken on that, the cmp does not propagate to what are > > logical copies. > > I saw that llvm can add a redundant r1 = w2 and use r1 as LHS in some cases, > but I haven't seen extra shifts. It turned out there are indeed a bunch of redundant shifts when u32 or s32 is passed into "r" asm constraint. Strangely the shifts are there when compiled with -mcpu=v3 or v4 and no shifts with -mcpu=v1 and v2. Also weird that u8 and u16 are passed into "r" without redundant shifts. Hence I found a "workaround": cast u32 into u16 while passing. The truncation of u32 doesn't happen and shifts to zero upper 32-bit are gone as well. https://godbolt.org/z/Kqszr6q3v Another issue is llvm removes asm() completely when output "+r" constraint is used. It has to be asm volatile to convince llvm to keep that asm block. That's bad1() case. asm() stays in place when input "r" constraint is used. That's bad2(). asm() removal happens with x86 backend too. So maybe it's a feature? I was worried whether: asm("mov %[reg],%[reg]"::[reg]"r"((short)var)); is buggy too. Meaning whether the compiler should truncate before allocating an input register. It turns out it's not. At least there is no truncation on x86 and on arm64. https://godbolt.org/z/ovMdPxnj5 So I'm planning to unconditionally use "(short)var" workaround in bpf_cmp. Anyway, none of that is urgent. Yonghong or Eduard, Please take a look at redundant shifts issue with mcpu=v3 after the holidays.