Re: [PATCH bpf] bpf: fix potentially incorrect results with bpf_get_local_storage()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 9, 2021 at 6:04 PM Yonghong Song <yhs@xxxxxx> wrote:
>
> Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in
> bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage()
> helper so different tasks won't mess up with each other's
> percpu local storage.
>
> The percpu data contains 8 slots so it can hold up to 8 contexts
> (same or different tasks), for 8 different program runs,
> at the same time. This in general is sufficient. But our internal
> testing showed the following warning multiple times:
>
>   warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
>      __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
>   RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
>   <IRQ>
>    tcp_call_bpf.constprop.99+0x93/0xc0
>    tcp_conn_request+0x41e/0xa50
>    ? tcp_rcv_state_process+0x203/0xe00
>    tcp_rcv_state_process+0x203/0xe00
>    ? sk_filter_trim_cap+0xbc/0x210
>    ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
>    tcp_v6_do_rcv+0x181/0x3e0
>    tcp_v6_rcv+0xc65/0xcb0
>    ip6_protocol_deliver_rcu+0xbd/0x450
>    ip6_input_finish+0x11/0x20
>    ip6_input+0xb5/0xc0
>    ip6_sublist_rcv_finish+0x37/0x50
>    ip6_sublist_rcv+0x1dc/0x270
>    ipv6_list_rcv+0x113/0x140
>    __netif_receive_skb_list_core+0x1a0/0x210
>    netif_receive_skb_list_internal+0x186/0x2a0
>    gro_normal_list.part.170+0x19/0x40
>    napi_complete_done+0x65/0x150
>    mlx5e_napi_poll+0x1ae/0x680
>    __napi_poll+0x25/0x120
>    net_rx_action+0x11e/0x280
>    __do_softirq+0xbb/0x271
>    irq_exit_rcu+0x97/0xa0
>    common_interrupt+0x7f/0xa0
>    </IRQ>
>    asm_common_interrupt+0x1e/0x40
>   RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
>    ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
>    ? do_softirq+0x34/0x70
>    ? ip6_finish_output2+0x266/0x590
>    ? ip6_finish_output+0x66/0xa0
>    ? ip6_output+0x6c/0x130
>    ? ip6_xmit+0x279/0x550
>    ? ip6_dst_check+0x61/0xd0
>   ...
>
> Using drgn to dump the percpu buffer contents showed that

worth putting the reference to drgn for people not familiar with what it is

  [0] https://github.com/osandov/drgn

> on this cpu slot 0 is still available but
> slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist
> any more. So we might have issues in bpf_cgroup_storage_unset().
>
> Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
> Currently, it tries to unset "current" slot with searching from the start.
> So the following sequence is possible:
>   1. a task is running and claims slot 0
>   2. running bpf program is done, and it checked slot "0" has the "task"
>      and ready to reset it to NULL (not yet).
>   3. an interrupt happens, another bpf program runs and it claims slot 1
>      with the *same* task.
>   4. the unset() in interrupt context releases slot 0 since it matches "task".
>   5. interrupt is done, the task in process context reset slot 0.
>
> At the end, slot 1 is not reset and the same process can continue to occupy
> slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 bpf program
> won't be able to claim an empty slot and a warning will be issued.
>
> To fix the issue, for unset() function, we should traverse from the last slot
> to the first. This way, the above issue can be avoided.
>
> The same reverse traversal should also be done in bpf_get_local_storage() helper
> itself. Otherwise, incorrect local storage may be returned to bpf program.
>
> Cc: Roman Gushchin <guro@xxxxxx>
> Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>

>  include/linux/bpf-cgroup.h | 4 ++--
>  kernel/bpf/helpers.c       | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> This patch targets to bpf tree. In bpf-next,
> Andrii's c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")
> should have fixed the issue too. I also okay with backporting Andrii's patch
> to bpf tree if it is viable.
>

I don't have preferences, but my patch might be a bit harder to
backport, so doing this as a fix is fine with me.

> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 8b77d08d4b47..6c9b10d82c80 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -201,8 +201,8 @@ static inline void bpf_cgroup_storage_unset(void)
>  {
>         int i;
>
> -       for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) {
> -               if (unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current))
> +       for (i = BPF_CGROUP_STORAGE_NEST_MAX - 1; i >= 0; i--) {
> +               if (likely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current))
>                         continue;
>
>                 this_cpu_write(bpf_cgroup_storage_info[i].task, NULL);
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62cf00383910..9b3f16eee21f 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -397,8 +397,8 @@ BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
>         void *ptr;
>         int i;
>
> -       for (i = 0; i < BPF_CGROUP_STORAGE_NEST_MAX; i++) {
> -               if (unlikely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current))
> +       for (i = BPF_CGROUP_STORAGE_NEST_MAX - 1; i >= 0; i--) {
> +               if (likely(this_cpu_read(bpf_cgroup_storage_info[i].task) != current))
>                         continue;
>
>                 storage = this_cpu_read(bpf_cgroup_storage_info[i].storage[stype]);
> --
> 2.30.2
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux