Re: [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



HI,

On 11/7/2021 2:43 AM, Alexei Starovoitov wrote:
> On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote:
>> The benchmark runs a loop 5000 times. In the loop it reads the file name
>> from kprobe argument into stack by using bpf_probe_read_kernel_str(),
>> and compares the file name with a target character or string.
>>
>> Three cases are compared: only compare one character, compare the whole
>> string by a home-made strncmp() and compare the whole string by
>> bpf_strcmp().
>>
>> The following is the result:
>>
>> x86-64 host:
>>
>> one character: 2613499 ns
>> whole str by strncmp: 2920348 ns
>> whole str by helper: 2779332 ns
>>
>> arm64 host:
>>
>> one character: 3898867 ns
>> whole str by strncmp: 4396787 ns
>> whole str by helper: 3968113 ns
>>
>> Compared with home-made strncmp, the performance of bpf_strncmp helper
>> improves 80% under x86-64 and 600% under arm64. The big performance win
>> on arm64 may comes from its arch-optimized strncmp().
> 80% and 600% improvement?!
> I don't understand how this math works.
> Why one char is barely different in total nsec than the whole string?
> The string shouldn't miscompare on the first char as far as I understand the test.
Because the result of "one character" includes the overhead of process filtering and
string read.
My bad, I should explain the tests results in more details.

Three tests are exercised:

(1) one character
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Only compare the first character of file name

(2) whole str by strncmp
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using home-made strncmp(): the compared two strings are the same, so
the whole string is compared

(3) whole str by helper
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using bpf_strncmp: the compared two strings are the same, so
the whole string is compared

Now "(1) one character" is used to calculate the overhead of process filtering and
string read. So under x86-64, the overhead of strncmp() is

  total time of whole str by strncmp  test  - total time of no character test =
306849 ns.

The overhead of bpf_strncmp() is:
  total time of whole str by helper test - total time of no character test =
165833 ns

So the performance win is about (306849  / 165833 ) * 100 - 100 = ~85%

And the win under arm64 is about (497920 / 69246) * 100 - 100 = ~600%



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux