Re: [PATCH bpf-next] bpf: stackmap: add crosstask check to `__bpf_get_stack`

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 7 Nov 2023 14:01:02 -0800



On Tue, Nov 7, 2023 at 1:15 PM Jordan Rome <jordalgo@xxxxxxxx> wrote:
>
>
>
> On 11/7/23 4:03 PM, Andrii Nakryiko wrote:
> > >
> > On Tue, Nov 7, 2023 at 1:01 PM Jordan Rome <jordalgo@xxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 11/6/23 5:45 PM, Andrii Nakryiko wrote:
> >>>>
> >>> On Mon, Nov 6, 2023 at 2:15 PM Jordan Rome <jordalgo@xxxxxxxx> wrote:
> >>>>
> >>>> Currently `get_perf_callchain` only supports user stack walking for
> >>>> the current task. Passing the correct *crosstask* param will return
> >>>> -EFAULT if the task passed to `__bpf_get_stack` isn't the current
> >>>> one instead of a single incorrect frame/address.
> >>>>
> >>>> This issue was found using `bpf_get_task_stack` inside a BPF
> >>>> iterator ("iter/task"), which iterates over all tasks.
> >>>> `bpf_get_task_stack` works fine for fetching kernel stacks
> >>>> but because `get_perf_callchain` relies on the caller to know
> >>>> if the requested *task* is the current one (via *crosstask*)
> >>>> it wasn't returning an error.
> >>>>
> >>>> It might be possible to get user stacks for all tasks utilizing
> >>>> something like `access_process_vm` but that requires the bpf
> >>>> program calling `bpf_get_task_stack` to be sleepable and would
> >>>> therefore be a breaking change.
> >>>>
> >>>> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> >>>> Signed-off-by: Jordan Rome <jordalgo@xxxxxxxx>
> >>>> ---
> >>>>    include/uapi/linux/bpf.h                                | 3 +++
> >>>>    kernel/bpf/stackmap.c                                   | 3 ++-
> >>>>    tools/include/uapi/linux/bpf.h                          | 3 +++
> >>>>    tools/testing/selftests/bpf/prog_tests/bpf_iter.c       | 3 +++
> >>>>    tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c | 5 +++++
> >>>>    5 files changed, 16 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>>> index 0f6cdf52b1da..da2871145274 100644
> >>>> --- a/include/uapi/linux/bpf.h
> >>>> +++ b/include/uapi/linux/bpf.h
> >>>> @@ -4517,6 +4517,8 @@ union bpf_attr {
> >>>>     * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
> >>>>     *     Description
> >>>>     *             Return a user or a kernel stack in bpf program provided buffer.
> >>>> + *             Note: the user stack will only be populated if the *task* is
> >>>> + *             the current task; all other tasks will return -EFAULT.
> >>>
> >>> I thought that you were not getting an error even for a non-current
> >>> task with BPF_F_USER_STACK? Shouldn't we make sure to return error
> >>> (-ENOTSUP?) for such cases? Taking a quick look at
> >>> get_perf_callchain(), it doesn't seem to return NULL in such cases.
> >>
> >> You're right. `get_perf_callchain` does not return -EFAULT. I misread.
> >> This change will make `__bpf_get_stack` return 0 instead of 1 frame.
> >> We could return `-ENOTSUP` but then we're adding additional crosstask
> >> checking in `__bpf_get_stack` instead of just passing the correct
> >> `crosstask` param value to `get_perf_callchain` and letting it
> >> check. If then, in the future, `get_perf_callchain` does support
> >> crosstask user stack walking then `__bpf_get_stack` would still be
> >> returning -ENOTSUP.
> >
> > Yes, but current behavior is worse. So we either return -ENOTSUP from
> > BPF helper for conditions we now are not supported right now. Or we
> > change get_perf_callchain() to return NULL, and then return just
> > generic error (-EINVAL?), which is not bad, but not as meaningful as
> > -ENOSUP.
> >
> > So I'd say let's add -ENOTSUP, but also return NULL from
> > get_perf_callchain? For the latter change, though, please CC relevant
> > perf list/folks, so that they are aware (and maybe they can suggest
> > the best way to add support for this).
> >
>
> Ok, I'll have `__bpf_get_stack` return -ENOTSUP and submit a separate
> patch for `get_perf_callchain` to make this cleaner. Sound good?

yep, thanks

>
> >>
> >>>
> >>>>     *             To achieve this, the helper needs *task*, which is a valid
> >>>>     *             pointer to **struct task_struct**. To store the stacktrace, the
> >>>>     *             bpf program provides *buf* with a nonnegative *size*.
> >>>> @@ -4528,6 +4530,7 @@ union bpf_attr {
> >>>>     *
> >>>>     *             **BPF_F_USER_STACK**
> >>>>     *                     Collect a user space stack instead of a kernel stack.
> >>>> + *                     The *task* must be the current task.
> >>>>     *             **BPF_F_USER_BUILD_ID**
> >>>>     *                     Collect buildid+offset instead of ips for user stack,
> >>>>     *                     only valid if **BPF_F_USER_STACK** is also specified.
> >>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >>>> index d6b277482085..96641766e90c 100644
> >>>> --- a/kernel/bpf/stackmap.c
> >>>> +++ b/kernel/bpf/stackmap.c
> >>>> @@ -388,6 +388,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> >>>>    {
> >>>>           u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> >>>>           bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> >>>> +       bool crosstask = task && task != current;
> >>>>           u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> >>>>           bool user = flags & BPF_F_USER_STACK;
> >>>>           struct perf_callchain_entry *trace;
> >>>> @@ -421,7 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> >>>>                   trace = get_callchain_entry_for_task(task, max_depth);
> >>>>           else
> >>>>                   trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> >>>> -                                          false, false);
> >>>> +                                          crosstask, false);
> >>>>           if (unlikely(!trace))
> >>>>                   goto err_fault;
> >>>>
> >>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >>>> index 0f6cdf52b1da..da2871145274 100644
> >>>> --- a/tools/include/uapi/linux/bpf.h
> >>>> +++ b/tools/include/uapi/linux/bpf.h
> >>>> @@ -4517,6 +4517,8 @@ union bpf_attr {
> >>>>     * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
> >>>>     *     Description
> >>>>     *             Return a user or a kernel stack in bpf program provided buffer.
> >>>> + *             Note: the user stack will only be populated if the *task* is
> >>>> + *             the current task; all other tasks will return -EFAULT.
> >>>>     *             To achieve this, the helper needs *task*, which is a valid
> >>>>     *             pointer to **struct task_struct**. To store the stacktrace, the
> >>>>     *             bpf program provides *buf* with a nonnegative *size*.
> >>>> @@ -4528,6 +4530,7 @@ union bpf_attr {
> >>>>     *
> >>>>     *             **BPF_F_USER_STACK**
> >>>>     *                     Collect a user space stack instead of a kernel stack.
> >>>> + *                     The *task* must be the current task.
> >>>>     *             **BPF_F_USER_BUILD_ID**
> >>>>     *                     Collect buildid+offset instead of ips for user stack,
> >>>>     *                     only valid if **BPF_F_USER_STACK** is also specified.
> >>>> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >>>> index 4e02093c2cbe..757635145510 100644
> >>>> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >>>> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >>>> @@ -332,6 +332,9 @@ static void test_task_stack(void)
> >>>>           do_dummy_read(skel->progs.dump_task_stack);
> >>>>           do_dummy_read(skel->progs.get_task_user_stacks);
> >>>>
> >>>> +       ASSERT_EQ(skel->bss->num_user_stacks, 1,
> >>>> +                 "num_user_stacks");
> >>>> +
> >>>
> >>> please split selftests into a separate patch
> >>>
> >>
> >> Will do.
> >>
> >>>>           bpf_iter_task_stack__destroy(skel);
> >>>>    }
> >>>>
> >>>> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >>>> index f2b8167b72a8..442f4ca39fd7 100644
> >>>> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >>>> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >>>> @@ -35,6 +35,8 @@ int dump_task_stack(struct bpf_iter__task *ctx)
> >>>>           return 0;
> >>>>    }
> >>>>
> >>>> +int num_user_stacks = 0;
> >>>> +
> >>>>    SEC("iter/task")
> >>>>    int get_task_user_stacks(struct bpf_iter__task *ctx)
> >>>>    {
> >>>> @@ -51,6 +53,9 @@ int get_task_user_stacks(struct bpf_iter__task *ctx)
> >>>>           if (res <= 0)
> >>>>                   return 0;
> >>>>
> >>>> +       /* Only one task, the current one, should succeed */
> >>>> +       ++num_user_stacks;
> >>>> +
> >>>>           buf_sz += res;
> >>>>
> >>>>           /* If the verifier doesn't refine bpf_get_task_stack res, and instead
> >>>> --
> >>>> 2.39.3
> >>>>