Re: [RFC PATCH v3 0/3] Introduce BPF map tracing capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/3/21 10:49 AM, Alexei Starovoitov wrote:
On Wed, Nov 3, 2021 at 10:45 AM Joe Burton <jevburton.kernel@xxxxxxxxx> wrote:

Sort of - I hit issues when defining the function in the same
compilation unit as the call site. For example:

   static noinline int bpf_array_map_trace_update(struct bpf_map *map,
                 void *key, void *value, u64 map_flags)

Not quite :)
You've had this issue because of 'static noinline'.
Just 'noinline' would not have such issues even in the same file.

This seems not true. With latest trunk clang,

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
       0: b8 01 00 00 00                movl    $1, %eax
       5: c3                            retq
       6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
      10: b8 02 00 00 00                movl    $2, %eax
      15: c3                            retq
[$ ~/tmp2]

The compiler did the optimization and the original noinline function still in the binary.

With a single foo() in bar() has the same effect.

asm("") indeed helped preserve the call.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { asm(""); return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
       0: b8 01 00 00 00                movl    $1, %eax
       5: c3                            retq
       6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
      10: 50                            pushq   %rax
      11: e8 00 00 00 00                callq   0x16 <bar+0x6>
      16: e8 00 00 00 00                callq   0x1b <bar+0xb>
      1b: b8 02 00 00 00                movl    $2, %eax
      20: 59                            popq    %rcx
      21: c3                            retq
[$ ~/tmp2]

Note with asm(""), foo() is called twice, but the compiler optimization
knows foo()'s return value is 1 so it did calculation at compiler time,
assign the 2 to %eax and returns.

Having a single foo() in bar() has the same effect.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
       0: b8 01 00 00 00                movl    $1, %eax
       5: c3                            retq
       6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
      10: b8 01 00 00 00                movl    $1, %eax
      15: c3                            retq
[$ ~/tmp2]

I checked with a few llvm compiler engineers in Facebook.
They mentioned there is nothing preventing compiler from doing
optimization like poking inside the noinline function and doing
some optimization based on that knowledge.


Reminder: please don't top post and trim your replies.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux