Re: [PATCH bpf-next] selftests/bpf: Fix string read in strncmp benchmark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/13/25 00:45, Andrii Nakryiko wrote:
> On Wed, Mar 12, 2025 at 1:39 AM Viktor Malik <vmalik@xxxxxxxxxx> wrote:
>>
>> The strncmp benchmark uses the bpf_strncmp helper and a hand-written
>> loop to compare two strings. The values of the strings are filled from
>> userspace. One of the strings is non-const (in .bss) while the other is
>> const (in .rodata) since that is the requirement of bpf_strncmp.
>>
>> The problem is that in the hand-written loop, Clang optimizes the reads
>> from the const string to always return 0 which breaks the benchmark.
>>
>> Mark the const string as volatile to avoid that.
>>
>> The effect can be seen on the strncmp-no-helper variant.
>>
>> Before this change:
>>
>>     # ./bench strncmp-no-helper
>>     Setting up benchmark 'strncmp-no-helper'...
>>     Benchmark 'strncmp-no-helper' started.
>>     Iter   0 (8440.107us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   1 (73909.374us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   2 (-8140.994us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   3 (3094.474us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   4 (-2828.468us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   5 (2635.595us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Iter   6 (-306.478us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
>>     Summary: hits    0.000 ± 0.000M/s (  0.000M/prod), drops    0.000 ± 0.000M/s, total operations    0.000 ± 0.000M/s
>>
>> After this change:
>>
>>     # ./bench strncmp-no-helper
>>     Setting up benchmark 'strncmp-no-helper'...
>>     Benchmark 'strncmp-no-helper' started.
>>     Iter   0 (21180.011us): hits    5.320M/s (  5.320M/prod), drops    0.000M/s, total operations    5.320M/s
>>     Iter   1 (-692.499us): hits    5.246M/s (  5.246M/prod), drops    0.000M/s, total operations    5.246M/s
>>     Iter   2 (-704.751us): hits    5.332M/s (  5.332M/prod), drops    0.000M/s, total operations    5.332M/s
>>     Iter   3 (62057.929us): hits    5.299M/s (  5.299M/prod), drops    0.000M/s, total operations    5.299M/s
>>     Iter   4 (-7981.421us): hits    5.303M/s (  5.303M/prod), drops    0.000M/s, total operations    5.303M/s
>>     Iter   5 (3500.341us): hits    5.306M/s (  5.306M/prod), drops    0.000M/s, total operations    5.306M/s
>>     Iter   6 (-3851.046us): hits    5.264M/s (  5.264M/prod), drops    0.000M/s, total operations    5.264M/s
>>     Summary: hits    5.338 ± 0.147M/s (  5.338M/prod), drops    0.000 ± 0.000M/s, total operations    5.338 ± 0.147M/s
>>
>> Signed-off-by: Viktor Malik <vmalik@xxxxxxxxxx>
>> ---
>>  tools/testing/selftests/bpf/progs/strncmp_bench.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c
>> index 18373a7df76e..92a828a1ebea 100644
>> --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c
>> +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c
>> @@ -9,7 +9,7 @@
>>
>>  /* Will be updated by benchmark before program loading */
>>  const volatile unsigned int cmp_str_len = 1;
>> -const char target[STRNCMP_STR_SZ];
>> +const volatile char target[STRNCMP_STR_SZ];
>>
>>  long hits = 0;
>>  char str[STRNCMP_STR_SZ];
>> @@ -17,7 +17,7 @@ char str[STRNCMP_STR_SZ];
>>  char _license[] SEC("license") = "GPL";
>>
>>  static __always_inline int local_strncmp(const char *s1, unsigned int sz,
>> -                                        const char *s2)
>> +                                        const volatile char *s2)
> 
> this will be a bit unfair to local_strncmp(), as now you'll be forcing
> the compiler to re-read s1[i] twice, right? What if we do:

Ah, right, I didn't realize s1[i] was used twice there.

> 
> 
> diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c
> b/tools/testing/selftests/bpf/progs/strncmp_bench.c
> index 18373a7df76e..f47bf88f8d2a 100644
> --- a/tools/testing/selftests/bpf/progs/strncmp_bench.c
> +++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c
> @@ -35,7 +35,10 @@ static __always_inline int local_strncmp(const char
> *s1, unsigned int sz,
>  SEC("tp/syscalls/sys_enter_getpgid")
>  int strncmp_no_helper(void *ctx)
>  {
> -       if (local_strncmp(str, cmp_str_len + 1, target) < 0)
> +       const char *target_str = target;
> +
> +       barrier_var(target_str);
> +       if (local_strncmp(str, cmp_str_len + 1, target_str) < 0)
>                 __sync_add_and_fetch(&hits, 1);
>         return 0;
>  }
> 
> 
> that will prevent compiler optimization as well and won't force us to
> do all those casts?

Yeah, that works, I'll send v2.

Thanks!

> 
> pw-bot: cr
> 
> 
>>  {
>>         int ret = 0;
>>         unsigned int i;
>> @@ -43,7 +43,7 @@ int strncmp_no_helper(void *ctx)
>>  SEC("tp/syscalls/sys_enter_getpgid")
>>  int strncmp_helper(void *ctx)
>>  {
>> -       if (bpf_strncmp(str, cmp_str_len + 1, target) < 0)
>> +       if (bpf_strncmp(str, cmp_str_len + 1, (const char *)target) < 0)
>>                 __sync_add_and_fetch(&hits, 1);
>>         return 0;
>>  }
>> --
>> 2.48.1
>>
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux