Re: [PATCH bpf-next v6 1/4] bpf: Parameterize task iterators.

Kui-Feng Lee <kuifeng@xxxxxx> · Fri, 26 Aug 2022 14:49:43 +0000

On Thu, 2022-08-25 at 14:13 -0700, Andrii Nakryiko wrote:
> On Wed, Aug 24, 2022 at 5:16 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote:
> > 
> > On Wed, 2022-08-24 at 15:20 -0700, Andrii Nakryiko wrote:
> > > On Fri, Aug 19, 2022 at 3:09 PM Kui-Feng Lee <kuifeng@xxxxxx>
> > > wrote:
> > > > 
> > > > Allow creating an iterator that loops through resources of one
> > > > task/thread.
> > > > 
> > > > People could only create iterators to loop through all
> > > > resources of
> > > > files, vma, and tasks in the system, even though they were
> > > > interested
> > > > in only the resources of a specific task or process.  Passing
> > > > the
> > > > additional parameters, people can now create an iterator to go
> > > > through all resources or only the resources of a task.
> > > > 
> > > > Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxx>
> > > > ---
> > > >  include/linux/bpf.h            |  25 +++++++
> > > >  include/uapi/linux/bpf.h       |   6 ++
> > > >  kernel/bpf/task_iter.c         | 116
> > > > ++++++++++++++++++++++++++---
> > > > ----
> > > >  tools/include/uapi/linux/bpf.h |   6 ++
> > > >  4 files changed, 129 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 39bd36359c1e..59712dd917d8 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1729,8 +1729,33 @@ int bpf_obj_get_user(const char __user
> > > > *pathname, int flags);
> > > >         extern int bpf_iter_ ## target(args);                  
> > > > \
> > > >         int __init bpf_iter_ ## target(args) { return 0; }
> > > > 
> > > > +/*
> > > > + * The task type of iterators.
> > > > + *
> > > > + * For BPF task iterators, they can be parameterized with
> > > > various
> > > > + * parameters to visit only some of tasks.
> > > > + *
> > > > + * BPF_TASK_ITER_ALL (default)
> > > > + *     Iterate over resources of every task.
> > > > + *
> > > > + * BPF_TASK_ITER_TID
> > > > + *     Iterate over resources of a task/tid.
> > > > + *
> > > > + * BPF_TASK_ITER_TGID
> > > > + *     Iterate over reosurces of evevry task of a process /
> > > > task
> > > > group.
> > > 
> > > typos: resources, every
> > > 
> > > > + */
> > > > +enum bpf_iter_task_type {
> > > > +       BPF_TASK_ITER_ALL = 0,
> > > > +       BPF_TASK_ITER_TID,
> > > > +       BPF_TASK_ITER_TGID,
> > > > +};
> > > > +
> > > 
> > > [...]
> > > 
> > > >         rcu_read_lock();
> > > >  retry:
> > > > -       pid = find_ge_pid(*tid, ns);
> > > > +       pid = find_ge_pid(*tid, common->ns);
> > > >         if (pid) {
> > > > -               *tid = pid_nr_ns(pid, ns);
> > > > +               *tid = pid_nr_ns(pid, common->ns);
> > > >                 task = get_pid_task(pid, PIDTYPE_PID);
> > > >                 if (!task) {
> > > >                         ++*tid;
> > > >                         goto retry;
> > > > -               } else if (skip_if_dup_files &&
> > > > !thread_group_leader(task) &&
> > > > -                          task->files == task->group_leader-
> > > > > files) {
> > > > +               } else if ((skip_if_dup_files &&
> > > > !thread_group_leader(task) &&
> > > > +                           task->files == task->group_leader-
> > > > > files) ||
> > > > +                          (common->type == BPF_TASK_ITER_TGID
> > > > &&
> > > > +                           __task_pid_nr_ns(task,
> > > > PIDTYPE_TGID,
> > > > common->ns) != common->pid)) {
> > > 
> > > it gets super hard to follow this logic, would a simple helper
> > > function to calculate this condition (and maybe some comments to
> > > explain the logic behind these checks?) make it a bit more
> > > readable?
> > 
> > !matched_task(task, common, skip_if_dup_file)?
> > 
> > bool matched_task(struct task_struct *task,
> >                   struct bpf_iter_seq_task_common *common,
> >                   bool skip_if_dup_file) {
> >         /* Should not have the same 'files' if skip_if_dup_file is
> > true
> > */
> >         bool diff_files_if =
> >                 !skip_if_dup_file ||
> >                 (thread_group_leader(task) &&
> >                 task->file != task->gorup_leader->fies);
> >         /* Should have the given tgid if the type is
> > BPF_TASK_ITER_TGI
> > */
> >         bool have_tgid_if =
> >                 common->type != BPF_TASK_ITER_TGID ||
> >                 __task_pid_nr_ns(task, PIDTYPE_TGID,
> >                 common->ns) == common->pid;
> >         return diff_files_if && have_tgid_if;
> > }
> > 
> > How about this?
> > 
> 
> Hm... "matched_task" doesn't mean much, tbh, so not really. I wanted
> to suggest having a separate helper just for your TGID check and call
> it something more meaningful like "task_belongs_to_tgid". Can't come
> up with a good name for existing dup_file check, so I'd probably keep
> it as is. But also seems like there is same_thread_group() helper in
> include/linux/sched/signal.h, so let's look if we can use it, it
> seems
> like it's just comparing signal pointers (probably quite faster than
> what you are doing right now).
> 
> But looking at this some more made me realize that even if we specify
> pid (tgid in kernel terms) we are still going to iterate through all
> the tasks, essentially. Is that right? So TGID mode isn't great for
> speeding up, we should point out to users that if they want to
> iterate
> files of the process, they probably want to use TID mode and set tid
> to pid to use the early termination condition in TID.
> 
> It wasn't obvious to me until I re-read this patch like 3 times and
> wrote three different replies here :)
> 
> But then I also went looking at what procfs doing for
> /proc/<pid/task/* dirs. It does seem like there are faster ways to
> iterate all threads of a process. See next_tid() which uses
> next_thread(), etc. Can you please check those and see if we can have
> faster in-process iteration?
> 

I haven't notice this message until now.
It looks like very promise.  I will add it to next version.

> 
> > > 
> > > >                         put_task_struct(task);
> > > >                         task = NULL;
> > > >                         ++*tid;
> > > > @@ -56,7 +73,7 @@ static void *task_seq_start(struct seq_file
> > > > *seq,
> > > > loff_t *pos)
> > > >         struct bpf_iter_seq_task_info *info = seq->private;
> > > >         struct task_struct *task;
> > > > 
> > > > -       task = task_seq_get_next(info->common.ns, &info->tid,
> > > > false);
> > > > +       task = task_seq_get_next(&info->common, &info->tid,
> > > > false);
> > > >         if (!task)
> > > >                 return NULL;
> > > > 
> 
> [...]