On Tue, Nov 7, 2023 at 1:15 PM Jordan Rome <jordalgo@xxxxxxxx> wrote: > > > > On 11/7/23 4:03 PM, Andrii Nakryiko wrote: > > > > > On Tue, Nov 7, 2023 at 1:01 PM Jordan Rome <jordalgo@xxxxxxxx> wrote: > >> > >> > >> > >> On 11/6/23 5:45 PM, Andrii Nakryiko wrote: > >>>> > >>> On Mon, Nov 6, 2023 at 2:15 PM Jordan Rome <jordalgo@xxxxxxxx> wrote: > >>>> > >>>> Currently `get_perf_callchain` only supports user stack walking for > >>>> the current task. Passing the correct *crosstask* param will return > >>>> -EFAULT if the task passed to `__bpf_get_stack` isn't the current > >>>> one instead of a single incorrect frame/address. > >>>> > >>>> This issue was found using `bpf_get_task_stack` inside a BPF > >>>> iterator ("iter/task"), which iterates over all tasks. > >>>> `bpf_get_task_stack` works fine for fetching kernel stacks > >>>> but because `get_perf_callchain` relies on the caller to know > >>>> if the requested *task* is the current one (via *crosstask*) > >>>> it wasn't returning an error. > >>>> > >>>> It might be possible to get user stacks for all tasks utilizing > >>>> something like `access_process_vm` but that requires the bpf > >>>> program calling `bpf_get_task_stack` to be sleepable and would > >>>> therefore be a breaking change. > >>>> > >>>> Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") > >>>> Signed-off-by: Jordan Rome <jordalgo@xxxxxxxx> > >>>> --- > >>>> include/uapi/linux/bpf.h | 3 +++ > >>>> kernel/bpf/stackmap.c | 3 ++- > >>>> tools/include/uapi/linux/bpf.h | 3 +++ > >>>> tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 3 +++ > >>>> tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c | 5 +++++ > >>>> 5 files changed, 16 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>>> index 0f6cdf52b1da..da2871145274 100644 > >>>> --- a/include/uapi/linux/bpf.h > >>>> +++ b/include/uapi/linux/bpf.h > >>>> @@ -4517,6 +4517,8 @@ union bpf_attr { > >>>> * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags) > >>>> * Description > >>>> * Return a user or a kernel stack in bpf program provided buffer. > >>>> + * Note: the user stack will only be populated if the *task* is > >>>> + * the current task; all other tasks will return -EFAULT. > >>> > >>> I thought that you were not getting an error even for a non-current > >>> task with BPF_F_USER_STACK? Shouldn't we make sure to return error > >>> (-ENOTSUP?) for such cases? Taking a quick look at > >>> get_perf_callchain(), it doesn't seem to return NULL in such cases. > >> > >> You're right. `get_perf_callchain` does not return -EFAULT. I misread. > >> This change will make `__bpf_get_stack` return 0 instead of 1 frame. > >> We could return `-ENOTSUP` but then we're adding additional crosstask > >> checking in `__bpf_get_stack` instead of just passing the correct > >> `crosstask` param value to `get_perf_callchain` and letting it > >> check. If then, in the future, `get_perf_callchain` does support > >> crosstask user stack walking then `__bpf_get_stack` would still be > >> returning -ENOTSUP. > > > > Yes, but current behavior is worse. So we either return -ENOTSUP from > > BPF helper for conditions we now are not supported right now. Or we > > change get_perf_callchain() to return NULL, and then return just > > generic error (-EINVAL?), which is not bad, but not as meaningful as > > -ENOSUP. > > > > So I'd say let's add -ENOTSUP, but also return NULL from > > get_perf_callchain? For the latter change, though, please CC relevant > > perf list/folks, so that they are aware (and maybe they can suggest > > the best way to add support for this). > > > > Ok, I'll have `__bpf_get_stack` return -ENOTSUP and submit a separate > patch for `get_perf_callchain` to make this cleaner. Sound good? yep, thanks > > >> > >>> > >>>> * To achieve this, the helper needs *task*, which is a valid > >>>> * pointer to **struct task_struct**. To store the stacktrace, the > >>>> * bpf program provides *buf* with a nonnegative *size*. > >>>> @@ -4528,6 +4530,7 @@ union bpf_attr { > >>>> * > >>>> * **BPF_F_USER_STACK** > >>>> * Collect a user space stack instead of a kernel stack. > >>>> + * The *task* must be the current task. > >>>> * **BPF_F_USER_BUILD_ID** > >>>> * Collect buildid+offset instead of ips for user stack, > >>>> * only valid if **BPF_F_USER_STACK** is also specified. > >>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > >>>> index d6b277482085..96641766e90c 100644 > >>>> --- a/kernel/bpf/stackmap.c > >>>> +++ b/kernel/bpf/stackmap.c > >>>> @@ -388,6 +388,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, > >>>> { > >>>> u32 trace_nr, copy_len, elem_size, num_elem, max_depth; > >>>> bool user_build_id = flags & BPF_F_USER_BUILD_ID; > >>>> + bool crosstask = task && task != current; > >>>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK; > >>>> bool user = flags & BPF_F_USER_STACK; > >>>> struct perf_callchain_entry *trace; > >>>> @@ -421,7 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task, > >>>> trace = get_callchain_entry_for_task(task, max_depth); > >>>> else > >>>> trace = get_perf_callchain(regs, 0, kernel, user, max_depth, > >>>> - false, false); > >>>> + crosstask, false); > >>>> if (unlikely(!trace)) > >>>> goto err_fault; > >>>> > >>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > >>>> index 0f6cdf52b1da..da2871145274 100644 > >>>> --- a/tools/include/uapi/linux/bpf.h > >>>> +++ b/tools/include/uapi/linux/bpf.h > >>>> @@ -4517,6 +4517,8 @@ union bpf_attr { > >>>> * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags) > >>>> * Description > >>>> * Return a user or a kernel stack in bpf program provided buffer. > >>>> + * Note: the user stack will only be populated if the *task* is > >>>> + * the current task; all other tasks will return -EFAULT. > >>>> * To achieve this, the helper needs *task*, which is a valid > >>>> * pointer to **struct task_struct**. To store the stacktrace, the > >>>> * bpf program provides *buf* with a nonnegative *size*. > >>>> @@ -4528,6 +4530,7 @@ union bpf_attr { > >>>> * > >>>> * **BPF_F_USER_STACK** > >>>> * Collect a user space stack instead of a kernel stack. > >>>> + * The *task* must be the current task. > >>>> * **BPF_F_USER_BUILD_ID** > >>>> * Collect buildid+offset instead of ips for user stack, > >>>> * only valid if **BPF_F_USER_STACK** is also specified. > >>>> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c > >>>> index 4e02093c2cbe..757635145510 100644 > >>>> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c > >>>> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c > >>>> @@ -332,6 +332,9 @@ static void test_task_stack(void) > >>>> do_dummy_read(skel->progs.dump_task_stack); > >>>> do_dummy_read(skel->progs.get_task_user_stacks); > >>>> > >>>> + ASSERT_EQ(skel->bss->num_user_stacks, 1, > >>>> + "num_user_stacks"); > >>>> + > >>> > >>> please split selftests into a separate patch > >>> > >> > >> Will do. > >> > >>>> bpf_iter_task_stack__destroy(skel); > >>>> } > >>>> > >>>> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c > >>>> index f2b8167b72a8..442f4ca39fd7 100644 > >>>> --- a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c > >>>> +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c > >>>> @@ -35,6 +35,8 @@ int dump_task_stack(struct bpf_iter__task *ctx) > >>>> return 0; > >>>> } > >>>> > >>>> +int num_user_stacks = 0; > >>>> + > >>>> SEC("iter/task") > >>>> int get_task_user_stacks(struct bpf_iter__task *ctx) > >>>> { > >>>> @@ -51,6 +53,9 @@ int get_task_user_stacks(struct bpf_iter__task *ctx) > >>>> if (res <= 0) > >>>> return 0; > >>>> > >>>> + /* Only one task, the current one, should succeed */ > >>>> + ++num_user_stacks; > >>>> + > >>>> buf_sz += res; > >>>> > >>>> /* If the verifier doesn't refine bpf_get_task_stack res, and instead > >>>> -- > >>>> 2.39.3 > >>>>