On Fri, Aug 26, 2022 at 7:49 AM Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote: > > On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > > > > > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote: > > > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@xxxxxx> > > > > wrote: > > > > > > > > > > Allow creating an iterator that loops through resources of one > > > > > task/thread. > > > > > > > > > > People could only create iterators to loop through all > > > > > resources of > > > > > files, vma, and tasks in the system, even though they were > > > > > interested > > > > > in only the resources of a specific task or process. Passing > > > > > the > > > > > additional parameters, people can now create an iterator to go > > > > > through all resources or only the resources of a task. > > > > > > > > > > Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxx> > > > > > --- > > > > > include/linux/bpf.h | 25 +++++++ > > > > > include/uapi/linux/bpf.h | 6 ++ > > > > > kernel/bpf/task_iter.c | 116 > > > > > ++++++++++++++++++++++++++--- > > > > > ---- > > > > > tools/include/uapi/linux/bpf.h | 6 ++ > > > > > 4 files changed, 129 insertions(+), 24 deletions(-) > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > index 39bd36359c1e..59712dd917d8 100644 > > > > > --- a/include/linux/bpf.h > > > > > +++ b/include/linux/bpf.h > > > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user > > > > > *pathname, int flags); > > > > > extern int bpf_iter_ ## target(args); > > > > > \ > > > > > int __init bpf_iter_ ## target(args) { return 0; } > > > > > > > > > > +/* > > > > > + * The task type of iterators. > > > > > + * > > > > > + * For BPF task iterators, they can be parameterized with > > > > > various > > > > > + * parameters to visit only some of tasks. > > > > > + * > > > > > + * BPF_TASK_ITER_ALL (default) > > > > > + * Iterate over resources of every task. > > > > > + * > > > > > + * BPF_TASK_ITER_TID > > > > > + * Iterate over resources of a task/tid. > > > > > + * > > > > > + * BPF_TASK_ITER_TGID > > > > > + * Iterate over reosurces of evevry task of a process / > > > > > task > > > > > group. > > > > > > > > typos: resources, every > > > > > > > > > + */ > > > > > +enum bpf_iter_task_type { > > > > > + BPF_TASK_ITER_ALL = 0, > > > > > + BPF_TASK_ITER_TID, > > > > > + BPF_TASK_ITER_TGID, > > > > > +}; > > > > > + > > > > > > > > [...] > > > > > > > > > rcu_read_lock(); > > > > > retry: > > > > > - pid = find_ge_pid(*tid, ns); > > > > > + pid = find_ge_pid(*tid, common->ns); > > > > > if (pid) { > > > > > - *tid = pid_nr_ns(pid, ns); > > > > > + *tid = pid_nr_ns(pid, common->ns); > > > > > task = get_pid_task(pid, PIDTYPE_PID); > > > > > if (!task) { > > > > > ++*tid; > > > > > goto retry; > > > > > - } else if (skip_if_dup_files && > > > > > !thread_group_leader(task) && > > > > > - task->files == task->group_leader- > > > > > > files) { > > > > > + } else if ((skip_if_dup_files && > > > > > !thread_group_leader(task) && > > > > > + task->files == task->group_leader- > > > > > > files) || > > > > > + (common->type == BPF_TASK_ITER_TGID > > > > > && > > > > > + __task_pid_nr_ns(task, > > > > > PIDTYPE_TGID, > > > > > common->ns) != common->pid)) { > > > > > > > > it gets super hard to follow this logic, would a simple helper > > > > function to calculate this condition (and maybe some comments to > > > > explain the logic behind these checks?) make it a bit more > > > > readable? > > > > > > !matched_task(task, common, skip_if_dup_file)? > > > > > > bool matched_task(struct task_struct *task, > > > struct bpf_iter_seq_task_common *common, > > > bool skip_if_dup_file) { > > > /* Should not have the same 'files' if skip_if_dup_file is > > > true > > > */ > > > bool diff_files_if = > > > !skip_if_dup_file || > > > (thread_group_leader(task) && > > > task->file != task->gorup_leader->fies); > > > /* Should have the given tgid if the type is > > > BPF_TASK_ITER_TGI > > > */ > > > bool have_tgid_if = > > > common->type != BPF_TASK_ITER_TGID || > > > __task_pid_nr_ns(task, PIDTYPE_TGID, > > > common->ns) == common->pid; > > > return diff_files_if && have_tgid_if; > > > } > > > > > > How about this? > > > > > > > Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted > > to suggest having a separate helper just for your TGID check and call > > it something more meaningful like "task_belongs_to_tgid". Can't come > > up with a good name for existing dup_file check, so I'd probably keep > > it as is. But also seems like there is same_thread_group() helper in > > include/linux/sched/signal.h, so let's look if we can use it, it > > seems > > like it's just comparing signal pointers (probably quite faster than > > what you are doing right now). > > > > But looking at this some more made me realize that even if we specify > > pid (tgid in kernel terms) we are still going to iterate through all > > the tasks, essentially. Is that right? So TGID mode isn't great for > > speeding up, we should point out to users that if they want to > > iterate > > files of the process, they probably want to use TID mode and set tid > > to pid to use the early termination condition in TID. > > > > It wasn't obvious to me until I re-read this patch like 3 times and > > wrote three different replies here :) > > > > But then I also went looking at what procfs doing for > > /proc/<pid/task/* dirs. It does seem like there are faster ways to > > iterate all threads of a process. See next_tid() which uses > > next_thread(), etc. Can you please check those and see if we can have > > faster in-process iteration? > > > > I haven't notice this message until now. > It looks like very promise. I will add it to next version. No worries. Thanks! Hopefully there is a way to make efficient per-process file and vma iteration. > > > > > > > > > > > > put_task_struct(task); > > > > > task = NULL; > > > > > ++*tid; > > > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file > > > > > *seq, > > > > > loff_t *pos) > > > > > struct bpf_iter_seq_task_info *info = seq->private; > > > > > struct task_struct *task; > > > > > > > > > > - task = task_seq_get_next(info->common.ns, &info->tid, > > > > > false); > > > > > + task = task_seq_get_next(&info->common, &info->tid, > > > > > false); > > > > > if (!task) > > > > > return NULL; > > > > > > > > > [...] >