Re: [PATCH bpf-next] bpf: stackmap: add crosstask check to `__bpf_get_stack`

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 7 Nov 2023 13:03:51 -0800

On Tue, Nov 7, 2023 at 1:01 PM Jordan Rome <jordalgo@xxxxxxxx> wrote:
>
>
>
> On 11/6/23 5:45 PM, Andrii Nakryiko wrote:
> > >
> > On Mon, Nov 6, 2023 at 2:15 PM Jordan Rome <jordalgo@xxxxxxxx> wrote:
> >>
> >> Currently `get_perf_callchain` only supports user stack walking for
> >> the current task. Passing the correct *crosstask* param will return
> >> -EFAULT if the task passed to `__bpf_get_stack` isn't the current
> >> one instead of a single incorrect frame/address.
> >>
> >> This issue was found using `bpf_get_task_stack` inside a BPF
> >> iterator ("iter/task"), which iterates over all tasks.
> >> `bpf_get_task_stack` works fine for fetching kernel stacks
> >> but because `get_perf_callchain` relies on the caller to know
> >> if the requested *task* is the current one (via *crosstask*)
> >> it wasn't returning an error.
> >>
> >> It might be possible to get user stacks for all tasks utilizing
> >> something like `access_process_vm` but that requires the bpf
> >> program calling `bpf_get_task_stack` to be sleepable and would
> >> therefore be a breaking change.
> >>
> >> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
> >> Signed-off-by: Jordan Rome <jordalgo@xxxxxxxx>
> >> ---
> >>   include/uapi/linux/bpf.h                                | 3 +++
> >>   kernel/bpf/stackmap.c                                   | 3 ++-
> >>   tools/include/uapi/linux/bpf.h                          | 3 +++
> >>   tools/testing/selftests/bpf/prog_tests/bpf_iter.c       | 3 +++
> >>   tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c | 5 +++++
> >>   5 files changed, 16 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 0f6cdf52b1da..da2871145274 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -4517,6 +4517,8 @@ union bpf_attr {
> >>    * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
> >>    *     Description
> >>    *             Return a user or a kernel stack in bpf program provided buffer.
> >> + *             Note: the user stack will only be populated if the *task* is
> >> + *             the current task; all other tasks will return -EFAULT.
> >
> > I thought that you were not getting an error even for a non-current
> > task with BPF_F_USER_STACK? Shouldn't we make sure to return error
> > (-ENOTSUP?) for such cases? Taking a quick look at
> > get_perf_callchain(), it doesn't seem to return NULL in such cases.
>
> You're right. `get_perf_callchain` does not return -EFAULT. I misread.
> This change will make `__bpf_get_stack` return 0 instead of 1 frame.
> We could return `-ENOTSUP` but then we're adding additional crosstask
> checking in `__bpf_get_stack` instead of just passing the correct
> `crosstask` param value to `get_perf_callchain` and letting it
> check. If then, in the future, `get_perf_callchain` does support
> crosstask user stack walking then `__bpf_get_stack` would still be
> returning -ENOTSUP.

Yes, but current behavior is worse. So we either return -ENOTSUP from
BPF helper for conditions we now are not supported right now. Or we
change get_perf_callchain() to return NULL, and then return just
generic error (-EINVAL?), which is not bad, but not as meaningful as
-ENOSUP.

So I'd say let's add -ENOTSUP, but also return NULL from
get_perf_callchain? For the latter change, though, please CC relevant
perf list/folks, so that they are aware (and maybe they can suggest
the best way to add support for this).

>
> >
> >>    *             To achieve this, the helper needs *task*, which is a valid
> >>    *             pointer to **struct task_struct**. To store the stacktrace, the
> >>    *             bpf program provides *buf* with a nonnegative *size*.
> >> @@ -4528,6 +4530,7 @@ union bpf_attr {
> >>    *
> >>    *             **BPF_F_USER_STACK**
> >>    *                     Collect a user space stack instead of a kernel stack.
> >> + *                     The *task* must be the current task.
> >>    *             **BPF_F_USER_BUILD_ID**
> >>    *                     Collect buildid+offset instead of ips for user stack,
> >>    *                     only valid if **BPF_F_USER_STACK** is also specified.
> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >> index d6b277482085..96641766e90c 100644
> >> --- a/kernel/bpf/stackmap.c
> >> +++ b/kernel/bpf/stackmap.c
> >> @@ -388,6 +388,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> >>   {
> >>          u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> >>          bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> >> +       bool crosstask = task && task != current;
> >>          u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> >>          bool user = flags & BPF_F_USER_STACK;
> >>          struct perf_callchain_entry *trace;
> >> @@ -421,7 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> >>                  trace = get_callchain_entry_for_task(task, max_depth);
> >>          else
> >>                  trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> >> -                                          false, false);
> >> +                                          crosstask, false);
> >>          if (unlikely(!trace))
> >>                  goto err_fault;
> >>
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index 0f6cdf52b1da..da2871145274 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -4517,6 +4517,8 @@ union bpf_attr {
> >>    * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
> >>    *     Description
> >>    *             Return a user or a kernel stack in bpf program provided buffer.
> >> + *             Note: the user stack will only be populated if the *task* is
> >> + *             the current task; all other tasks will return -EFAULT.
> >>    *             To achieve this, the helper needs *task*, which is a valid
> >>    *             pointer to **struct task_struct**. To store the stacktrace, the
> >>    *             bpf program provides *buf* with a nonnegative *size*.
> >> @@ -4528,6 +4530,7 @@ union bpf_attr {
> >>    *
> >>    *             **BPF_F_USER_STACK**
> >>    *                     Collect a user space stack instead of a kernel stack.
> >> + *                     The *task* must be the current task.
> >>    *             **BPF_F_USER_BUILD_ID**
> >>    *                     Collect buildid+offset instead of ips for user stack,
> >>    *                     only valid if **BPF_F_USER_STACK** is also specified.
> >> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >> index 4e02093c2cbe..757635145510 100644
> >> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> >> @@ -332,6 +332,9 @@ static void test_task_stack(void)
> >>          do_dummy_read(skel->progs.dump_task_stack);
> >>          do_dummy_read(skel->progs.get_task_user_stacks);
> >>
> >> +       ASSERT_EQ(skel->bss->num_user_stacks, 1,
> >> +                 "num_user_stacks");
> >> +
> >
> > please split selftests into a separate patch
> >
>
> Will do.
>
> >>          bpf_iter_task_stack__destroy(skel);
> >>   }
> >>
> >> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >> index f2b8167b72a8..442f4ca39fd7 100644
> >> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
> >> @@ -35,6 +35,8 @@ int dump_task_stack(struct bpf_iter__task *ctx)
> >>          return 0;
> >>   }
> >>
> >> +int num_user_stacks = 0;
> >> +
> >>   SEC("iter/task")
> >>   int get_task_user_stacks(struct bpf_iter__task *ctx)
> >>   {
> >> @@ -51,6 +53,9 @@ int get_task_user_stacks(struct bpf_iter__task *ctx)
> >>          if (res <= 0)
> >>                  return 0;
> >>
> >> +       /* Only one task, the current one, should succeed */
> >> +       ++num_user_stacks;
> >> +
> >>          buf_sz += res;
> >>
> >>          /* If the verifier doesn't refine bpf_get_task_stack res, and instead
> >> --
> >> 2.39.3
> >>