On Fri, May 1, 2020 at 10:23 AM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 4/29/20 7:08 PM, Andrii Nakryiko wrote: > > On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@xxxxxx> wrote: > >> > >> Only the tasks belonging to "current" pid namespace > >> are enumerated. > >> > >> For task/file target, the bpf program will have access to > >> struct task_struct *task > >> u32 fd > >> struct file *file > >> where fd/file is an open file for the task. > >> > >> Signed-off-by: Yonghong Song <yhs@xxxxxx> > >> --- > >> kernel/bpf/Makefile | 2 +- > >> kernel/bpf/task_iter.c | 319 +++++++++++++++++++++++++++++++++++++++++ > >> 2 files changed, 320 insertions(+), 1 deletion(-) > >> create mode 100644 kernel/bpf/task_iter.c > >> > > > > [...] > > > >> +static void *task_seq_start(struct seq_file *seq, loff_t *pos) > >> +{ > >> + struct bpf_iter_seq_task_info *info = seq->private; > >> + struct task_struct *task; > >> + u32 id = info->id; > >> + > >> + if (*pos == 0) > >> + info->ns = task_active_pid_ns(current); > > > > I wonder why pid namespace is set in start() callback each time, while > > net_ns was set once when seq_file is created. I think it should be > > consistent, no? Either pid_ns is another feature and is set > > consistently just once using the context of the process that creates > > seq_file, or net_ns could be set using the same method without > > bpf_iter infra knowing about this feature? Or there are some > > non-obvious aspects which make pid_ns easier to work with? > > > > Either way, process read()'ing seq_file might be different than > > process open()'ing seq_file, so they might have different namespaces. > > We need to decide explicitly which context should be used and do it > > consistently. > > Good point. for networking case, the `net` namespace is locked > at seq_file open stage and later on it is used for seq_read(). > > I think I should do the same thing, locking down pid namespace > at open. Yeah, I think it's a good idea. > > > > >> + > >> + task = task_seq_get_next(info->ns, &id); > >> + if (!task) > >> + return NULL; > >> + > >> + ++*pos; > >> + info->task = task; > >> + info->id = id; > >> + > >> + return task; > >> +} > >> + > >> +static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos) > >> +{ > >> + struct bpf_iter_seq_task_info *info = seq->private; > >> + struct task_struct *task; > >> + > >> + ++*pos; > >> + ++info->id; > > > > this would make iterator skip pid 0? Is that by design? > > The start will try to find pid 0. That means start will never > return SEQ_START_TOKEN since the bpf program won't be called any way. Never mind, I confused task_seq_next() and task_seq_get_next() :) > > > > >> + task = task_seq_get_next(info->ns, &info->id); > >> + if (!task) > >> + return NULL; > >> + > >> + put_task_struct(info->task); > > > > on very first iteration info->task might be NULL, right? > > Even the first iteration info->task is not NULL. The start() > will forcefully try to find the first real task from idr number 0. > Right, goes to same confusion as above, sorry. > > > >> + info->task = task; > >> + return task; > >> +} > >> + > >> +struct bpf_iter__task { > >> + __bpf_md_ptr(struct bpf_iter_meta *, meta); > >> + __bpf_md_ptr(struct task_struct *, task); > >> +}; > >> + > >> +int __init __bpf_iter__task(struct bpf_iter_meta *meta, struct task_struct *task) > >> +{ > >> + return 0; > >> +} > >> + > >> +static int task_seq_show(struct seq_file *seq, void *v) > >> +{ > >> + struct bpf_iter_meta meta; > >> + struct bpf_iter__task ctx; > >> + struct bpf_prog *prog; > >> + int ret = 0; > >> + > >> + prog = bpf_iter_get_prog(seq, sizeof(struct bpf_iter_seq_task_info), > >> + &meta.session_id, &meta.seq_num, > >> + v == (void *)0); > >> + if (prog) { > > > > can it happen that prog is NULL? > > Yes, this function is shared between show() and stop(). > The stop() function might be called multiple times since > user can repeatedly try read() although there is nothing > there, in which case, the seq_ops will be just > start() and stop(). Ah, right, NULL case after end of iteration, got it. > > > > > > >> + meta.seq = seq; > >> + ctx.meta = &meta; > >> + ctx.task = v; > >> + ret = bpf_iter_run_prog(prog, &ctx); > >> + } > >> + > >> + return ret == 0 ? 0 : -EINVAL; > >> +} > >> + > >> +static void task_seq_stop(struct seq_file *seq, void *v) > >> +{ > >> + struct bpf_iter_seq_task_info *info = seq->private; > >> + > >> + if (!v) > >> + task_seq_show(seq, v); > > > > hmm... show() called from stop()? what's the case where this is necessary? > > I will refactor it better. This is to invoke bpf program > in stop() with NULL object to signal the end of > iteration. > > >> + > >> + if (info->task) { > >> + put_task_struct(info->task); > >> + info->task = NULL; > >> + } > >> +} > >> + > > > > [...] > >