Re: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 22, 2023 at 2:59 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Dec 20, 2023 at 8:28 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > When I was writing the _eq/lt/... variants I noticed that unless I use
> > LHS to be a full 8-byte register the compiler can still play
> > shenanigans even with inline assembly, like choosing a different input
> > register for LHS than the one already allocated for a variable before
> > the assembly is emitted, doing left/right shifts to mask upper bits
> > before the inline assembly logic, and thus since the scalar id
> > association is broken on that, the cmp does not propagate to what are
> > logical copies.
>
> I saw that llvm can add a redundant r1 = w2 and use r1 as LHS in some cases,
> but I haven't seen extra shifts.

It turned out there are indeed a bunch of redundant shifts
when u32 or s32 is passed into "r" asm constraint.

Strangely the shifts are there when compiled with -mcpu=v3 or v4
and no shifts with -mcpu=v1 and v2.

Also weird that u8 and u16 are passed into "r" without redundant shifts.
Hence I found a "workaround": cast u32 into u16 while passing.
The truncation of u32 doesn't happen and shifts to zero upper 32-bit
are gone as well.

https://godbolt.org/z/Kqszr6q3v

Another issue is llvm removes asm() completely when output "+r"
constraint is used. It has to be asm volatile to convince llvm
to keep that asm block. That's bad1() case.
asm() stays in place when input "r" constraint is used.
That's bad2().
asm() removal happens with x86 backend too. So maybe it's a feature?

I was worried whether:
asm("mov %[reg],%[reg]"::[reg]"r"((short)var));
is buggy too. Meaning whether the compiler should truncate before
allocating an input register.
It turns out it's not.
At least there is no truncation on x86 and on arm64.

https://godbolt.org/z/ovMdPxnj5

So I'm planning to unconditionally use "(short)var" workaround in bpf_cmp.

Anyway, none of that is urgent.
Yonghong or Eduard,
Please take a look at redundant shifts issue with mcpu=v3 after the holidays.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux