Re: [PATCH bpf-next v1 11/19] bpf: add task and task/file targets

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 1 May 2020 12:01:08 -0700

On Fri, May 1, 2020 at 10:23 AM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 4/29/20 7:08 PM, Andrii Nakryiko wrote:
> > On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>
> >> Only the tasks belonging to "current" pid namespace
> >> are enumerated.
> >>
> >> For task/file target, the bpf program will have access to
> >>    struct task_struct *task
> >>    u32 fd
> >>    struct file *file
> >> where fd/file is an open file for the task.
> >>
> >> Signed-off-by: Yonghong Song <yhs@xxxxxx>
> >> ---
> >>   kernel/bpf/Makefile    |   2 +-
> >>   kernel/bpf/task_iter.c | 319 +++++++++++++++++++++++++++++++++++++++++
> >>   2 files changed, 320 insertions(+), 1 deletion(-)
> >>   create mode 100644 kernel/bpf/task_iter.c
> >>
> >
> > [...]
> >
> >> +static void *task_seq_start(struct seq_file *seq, loff_t *pos)
> >> +{
> >> +       struct bpf_iter_seq_task_info *info = seq->private;
> >> +       struct task_struct *task;
> >> +       u32 id = info->id;
> >> +
> >> +       if (*pos == 0)
> >> +               info->ns = task_active_pid_ns(current);
> >
> > I wonder why pid namespace is set in start() callback each time, while
> > net_ns was set once when seq_file is created. I think it should be
> > consistent, no? Either pid_ns is another feature and is set
> > consistently just once using the context of the process that creates
> > seq_file, or net_ns could be set using the same method without
> > bpf_iter infra knowing about this feature? Or there are some
> > non-obvious aspects which make pid_ns easier to work with?
> >
> > Either way, process read()'ing seq_file might be different than
> > process open()'ing seq_file, so they might have different namespaces.
> > We need to decide explicitly which context should be used and do it
> > consistently.
>
> Good point. for networking case, the `net` namespace is locked
> at seq_file open stage and later on it is used for seq_read().
>
> I think I should do the same thing, locking down pid namespace
> at open.

Yeah, I think it's a good idea.

>
> >
> >> +
> >> +       task = task_seq_get_next(info->ns, &id);
> >> +       if (!task)
> >> +               return NULL;
> >> +
> >> +       ++*pos;
> >> +       info->task = task;
> >> +       info->id = id;
> >> +
> >> +       return task;
> >> +}
> >> +
> >> +static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> >> +{
> >> +       struct bpf_iter_seq_task_info *info = seq->private;
> >> +       struct task_struct *task;
> >> +
> >> +       ++*pos;
> >> +       ++info->id;
> >
> > this would make iterator skip pid 0? Is that by design?
>
> The start will try to find pid 0. That means start will never
> return SEQ_START_TOKEN since the bpf program won't be called any way.

Never mind, I confused task_seq_next() and task_seq_get_next() :)

>
> >
> >> +       task = task_seq_get_next(info->ns, &info->id);
> >> +       if (!task)
> >> +               return NULL;
> >> +
> >> +       put_task_struct(info->task);
> >
> > on very first iteration info->task might be NULL, right?
>
> Even the first iteration info->task is not NULL. The start()
> will forcefully try to find the first real task from idr number 0.
>

Right, goes to same confusion as above, sorry.

> >
> >> +       info->task = task;
> >> +       return task;
> >> +}
> >> +
> >> +struct bpf_iter__task {
> >> +       __bpf_md_ptr(struct bpf_iter_meta *, meta);
> >> +       __bpf_md_ptr(struct task_struct *, task);
> >> +};
> >> +
> >> +int __init __bpf_iter__task(struct bpf_iter_meta *meta, struct task_struct *task)
> >> +{
> >> +       return 0;
> >> +}
> >> +
> >> +static int task_seq_show(struct seq_file *seq, void *v)
> >> +{
> >> +       struct bpf_iter_meta meta;
> >> +       struct bpf_iter__task ctx;
> >> +       struct bpf_prog *prog;
> >> +       int ret = 0;
> >> +
> >> +       prog = bpf_iter_get_prog(seq, sizeof(struct bpf_iter_seq_task_info),
> >> +                                &meta.session_id, &meta.seq_num,
> >> +                                v == (void *)0);
> >> +       if (prog) {
> >
> > can it happen that prog is NULL?
>
> Yes, this function is shared between show() and stop().
> The stop() function might be called multiple times since
> user can repeatedly try read() although there is nothing
> there, in which case, the seq_ops will be just
> start() and stop().

Ah, right, NULL case after end of iteration, got it.

>
> >
> >
> >> +               meta.seq = seq;
> >> +               ctx.meta = &meta;
> >> +               ctx.task = v;
> >> +               ret = bpf_iter_run_prog(prog, &ctx);
> >> +       }
> >> +
> >> +       return ret == 0 ? 0 : -EINVAL;
> >> +}
> >> +
> >> +static void task_seq_stop(struct seq_file *seq, void *v)
> >> +{
> >> +       struct bpf_iter_seq_task_info *info = seq->private;
> >> +
> >> +       if (!v)
> >> +               task_seq_show(seq, v);
> >
> > hmm... show() called from stop()? what's the case where this is necessary?
>
> I will refactor it better. This is to invoke bpf program
> in stop() with NULL object to signal the end of
> iteration.
>
> >> +
> >> +       if (info->task) {
> >> +               put_task_struct(info->task);
> >> +               info->task = NULL;
> >> +       }
> >> +}
> >> +
> >
> > [...]
> >