On Wed, Aug 26, 2020 at 5:07 PM Yonghong Song <yhs@xxxxxx> wrote: > > Currently, task and task_file by default iterates through > all tasks. For task_file, by default, all files from all tasks > will be traversed. > > But for a user process, the file_table is shared by all threads > of that process. So traversing the main thread per process should > be enough to traverse all files and this can save a lot of cpu > time if some process has large number of threads and each thread > has lots of open files. > > This patch implemented a customization for task/task_file iterator, > permitting to traverse only the kernel task where its pid equal > to tgid in the kernel. This includes some kernel threads, and > main threads of user processes. This will solve the above potential > performance issue for task_file. This customization may be useful > for task iterator too if only traversing main threads is enough. > > Signed-off-by: Yonghong Song <yhs@xxxxxx> > --- > include/linux/bpf.h | 3 ++- > include/uapi/linux/bpf.h | 5 ++++ > kernel/bpf/task_iter.c | 46 +++++++++++++++++++++++----------- > tools/include/uapi/linux/bpf.h | 5 ++++ > 4 files changed, 43 insertions(+), 16 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index a6131d95e31e..058eb9b0ba78 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1220,7 +1220,8 @@ int bpf_obj_get_user(const char __user *pathname, int flags); > int __init bpf_iter_ ## target(args) { return 0; } > > struct bpf_iter_aux_info { > - struct bpf_map *map; > + struct bpf_map *map; /* for iterator traversing map elements */ > + bool main_thread_only; /* for task/task_file iterator */ As a user of task_file iterator I'd hate to make this decision, honestly, especially if I can't prove that all processes share the same file table (I think clone() syscall allows to do weird combinations like that, right?). It does make sense for a task/ iterator, though, if I need to iterate a user-space process (group of tasks). So can we do this: 1a. Either add a new bpf_iter type process/ (or in kernel lingo task_group/) to iterate only main threads (group_leader in kernel lingo); 1b. Or make this main_thread_only an option for only a task/ iterator (and maybe call it "group_leader_only" or something to reflect kernel terminology?) 2. For task/file iterator, still iterate all tasks, but if the task's files struct is the same as group_leader's files struct, then go to the next one. This should eliminate tons of duplication of iterating the same files over and over. It would still iterate a bunch of tasks, but compared to the number of files that's generally a much smaller number, so should still give sizable savings. I don't think we need an extra option for this, tbh, this behavior was my intuitive expectation, so I don't think you'll be breaking any sane user of this iterator. Disclaimer: I haven't got a chance to look through kernel code much, so I'm sorry if what I'm proposing is something that is impossible to implement or extremely hard/unreliable. But thought I'll put this idea out there before we decide on this. WDYT? [...]