Re: [PATCH bpf] bpf: Annotate bpf_long_memcpy with data_race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 29 Aug 2023 at 22:53, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>
> syzbot reported a data race splat between two processes trying to
> update the same BPF map value via syscall on different CPUs:
>
>   BUG: KCSAN: data-race in bpf_percpu_array_update / bpf_percpu_array_update
>
>   write to 0xffffe8fffe7425d8 of 8 bytes by task 8257 on cpu 1:
>    bpf_long_memcpy include/linux/bpf.h:428 [inline]
>    bpf_obj_memcpy include/linux/bpf.h:441 [inline]
>    copy_map_value_long include/linux/bpf.h:464 [inline]
>    bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
>    bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
>    generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
>    bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
>    __sys_bpf+0x28a/0x780
>    __do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
>    __se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
>    __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
>   write to 0xffffe8fffe7425d8 of 8 bytes by task 8268 on cpu 0:
>    bpf_long_memcpy include/linux/bpf.h:428 [inline]
>    bpf_obj_memcpy include/linux/bpf.h:441 [inline]
>    copy_map_value_long include/linux/bpf.h:464 [inline]
>    bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
>    bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
>    generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
>    bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
>    __sys_bpf+0x28a/0x780
>    __do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
>    __se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
>    __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
>   value changed: 0x0000000000000000 -> 0xfffffff000002788
>
> The bpf_long_memcpy is used with 8-byte aligned pointers, power-of-8 size
> and forced to use long read/writes to try to atomically copy long counters.
> It is best-effort only and no barriers are here since it _will_ race with
> concurrent updates from BPF programs. The bpf_long_memcpy() is called from
> bpf(2) syscall. Marco suggested that the best way to make this known to
> KCSAN would be to use data_race() annotation.
>
> Reported-by: syzbot+97522333291430dd277f@xxxxxxxxxxxxxxxxxxxxxxxxx
> Suggested-by: Marco Elver <elver@xxxxxxxxxx>
> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> Cc: Yonghong Song <yonghong.song@xxxxxxxxx>
> Link: https://lore.kernel.org/bpf/000000000000d87a7f06040c970c@xxxxxxxxxx

Given the "best-effort" nature of this, I do think data_race() is the
right approach:

Acked-by: Marco Elver <elver@xxxxxxxxxx>

But, tangentially related, reading the comment it looks like the
intent is that this should always be plain long loads. Loops like this
tend to make the compiler recognize it's a memcpy-like operation and
replace them with builtin memcpy, which in turn may turn into calls to
real memcpy(). Are such compiler optimizations ok?
If it's not ok, and you'd like to prevent the compiler from turning
into memcpy() calls, then there are several options:

  1. Do the READ_ONCE()/WRITE_ONCE() as you already suggested.
  2. barrier() within the loop.

If defending against the compiler turning it into memcpy() is a
side-goal, option #1 may be better after all.

> ---
>  include/linux/bpf.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f58895830ada..eb1bb76e87f8 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -425,7 +425,7 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
>
>         size /= sizeof(long);
>         while (size--)
> -               *ldst++ = *lsrc++;
> +               data_race(*ldst++ = *lsrc++);
>  }
>
>  /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */
> --
> 2.21.0
>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux