Re: [PATCH bpf] bpf: Annotate bpf_long_memcpy with data_race

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Thu, 31 Aug 2023 22:35:27 +0200

On 8/29/23 11:07 PM, Marco Elver wrote:
On Tue, 29 Aug 2023 at 22:53, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

syzbot reported a data race splat between two processes trying to
update the same BPF map value via syscall on different CPUs:

   BUG: KCSAN: data-race in bpf_percpu_array_update / bpf_percpu_array_update

   write to 0xffffe8fffe7425d8 of 8 bytes by task 8257 on cpu 1:
    bpf_long_memcpy include/linux/bpf.h:428 [inline]
    bpf_obj_memcpy include/linux/bpf.h:441 [inline]
    copy_map_value_long include/linux/bpf.h:464 [inline]
    bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
    bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
    generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
    bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
    __sys_bpf+0x28a/0x780
    __do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
    __se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
    __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

   write to 0xffffe8fffe7425d8 of 8 bytes by task 8268 on cpu 0:
    bpf_long_memcpy include/linux/bpf.h:428 [inline]
    bpf_obj_memcpy include/linux/bpf.h:441 [inline]
    copy_map_value_long include/linux/bpf.h:464 [inline]
    bpf_percpu_array_update+0x3bb/0x500 kernel/bpf/arraymap.c:380
    bpf_map_update_value+0x190/0x370 kernel/bpf/syscall.c:175
    generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1749
    bpf_map_do_batch+0x2df/0x3d0 kernel/bpf/syscall.c:4648
    __sys_bpf+0x28a/0x780
    __do_sys_bpf kernel/bpf/syscall.c:5241 [inline]
    __se_sys_bpf kernel/bpf/syscall.c:5239 [inline]
    __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5239
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd

   value changed: 0x0000000000000000 -> 0xfffffff000002788

The bpf_long_memcpy is used with 8-byte aligned pointers, power-of-8 size
and forced to use long read/writes to try to atomically copy long counters.
It is best-effort only and no barriers are here since it _will_ race with
concurrent updates from BPF programs. The bpf_long_memcpy() is called from
bpf(2) syscall. Marco suggested that the best way to make this known to
KCSAN would be to use data_race() annotation.

Reported-by: syzbot+97522333291430dd277f@xxxxxxxxxxxxxxxxxxxxxxxxx
Suggested-by: Marco Elver <elver@xxxxxxxxxx>
Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Cc: Yonghong Song <yonghong.song@xxxxxxxxx>
Link: https://lore.kernel.org/bpf/000000000000d87a7f06040c970c@xxxxxxxxxx

Given the "best-effort" nature of this, I do think data_race() is the
right approach:

Acked-by: Marco Elver <elver@xxxxxxxxxx>

Thanks!

But, tangentially related, reading the comment it looks like the
intent is that this should always be plain long loads. Loops like this
tend to make the compiler recognize it's a memcpy-like operation and
replace them with builtin memcpy, which in turn may turn into calls to
real memcpy(). Are such compiler optimizations ok?
If it's not ok, and you'd like to prevent the compiler from turning
into memcpy() calls, then there are several options:

   1. Do the READ_ONCE()/WRITE_ONCE() as you already suggested.
   2. barrier() within the loop.

If defending against the compiler turning it into memcpy() is a
side-goal, option #1 may be better after all.

I've taken the data_race() for now given it doesn't change the code itself
and is trivial. I'm kind of leaning towards barrier() perhaps as well, as
READ_ONCE() / WRITE_ONCE() feels somewhat mispurposed in this setting, but
I need to play around some more with it first and see the code gen.

Thanks,
Daniel