On Thu, 2022-07-28 at 19:00 +0200, Kumar Kartikeya Dwivedi wrote: > On Thu, 28 Jul 2022 at 18:40, Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > > > On Thu, 2022-07-28 at 18:22 +0200, Kumar Kartikeya Dwivedi wrote: > > > On Thu, 28 Jul 2022 at 17:16, Kui-Feng Lee <kuifeng@xxxxxx> > > > wrote: > > > > > > > > On Thu, 2022-07-28 at 10:47 +0200, Kumar Kartikeya Dwivedi > > > > wrote: > > > > > On Thu, 28 Jul 2022 at 07:25, Kui-Feng Lee <kuifeng@xxxxxx> > > > > > wrote: > > > > > > > > > > > > On Wed, 2022-07-27 at 10:19 +0200, Kumar Kartikeya Dwivedi > > > > > > wrote: > > > > > > > On Wed, 27 Jul 2022 at 09:01, Kui-Feng Lee > > > > > > > <kuifeng@xxxxxx> > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, 2022-07-26 at 14:13 +0200, Jiri Olsa wrote: > > > > > > > > > On Mon, Jul 25, 2022 at 10:17:11PM -0700, Kui-Feng > > > > > > > > > Lee > > > > > > > > > wrote: > > > > > > > > > > Allow creating an iterator that loops through > > > > > > > > > > resources > > > > > > > > > > of > > > > > > > > > > one > > > > > > > > > > task/thread. > > > > > > > > > > > > > > > > > > > > People could only create iterators to loop through > > > > > > > > > > all > > > > > > > > > > resources of > > > > > > > > > > files, vma, and tasks in the system, even though > > > > > > > > > > they > > > > > > > > > > were > > > > > > > > > > interested > > > > > > > > > > in only the resources of a specific task or > > > > > > > > > > process. > > > > > > > > > > Passing > > > > > > > > > > the > > > > > > > > > > additional parameters, people can now create an > > > > > > > > > > iterator to > > > > > > > > > > go > > > > > > > > > > through all resources or only the resources of a > > > > > > > > > > task. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Kui-Feng Lee <kuifeng@xxxxxx> > > > > > > > > > > --- > > > > > > > > > > include/linux/bpf.h | 4 ++ > > > > > > > > > > include/uapi/linux/bpf.h | 23 ++++++++++ > > > > > > > > > > kernel/bpf/task_iter.c | 81 > > > > > > > > > > +++++++++++++++++++++++++- > > > > > > > > > > ---- > > > > > > > > > > ---- > > > > > > > > > > tools/include/uapi/linux/bpf.h | 23 ++++++++++ > > > > > > > > > > 4 files changed, 109 insertions(+), 22 > > > > > > > > > > deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/include/linux/bpf.h > > > > > > > > > > b/include/linux/bpf.h > > > > > > > > > > index 11950029284f..c8d164404e20 100644 > > > > > > > > > > --- a/include/linux/bpf.h > > > > > > > > > > +++ b/include/linux/bpf.h > > > > > > > > > > @@ -1718,6 +1718,10 @@ int bpf_obj_get_user(const > > > > > > > > > > char > > > > > > > > > > __user > > > > > > > > > > *pathname, int flags); > > > > > > > > > > > > > > > > > > > > struct bpf_iter_aux_info { > > > > > > > > > > struct bpf_map *map; > > > > > > > > > > + struct { > > > > > > > > > > + __u32 tid; > > > > > > > > > > > > > > > > > > should be just u32 ? > > > > > > > > > > > > > > > > Or, should change the following 'type' to __u8? > > > > > > > > > > > > > > Would it be better to use a pidfd instead of a tid here? > > > > > > > Unset > > > > > > > pidfd > > > > > > > would mean going over all tasks, and any fd > 0 implies > > > > > > > attaching > > > > > > > to > > > > > > > a > > > > > > > specific task (as is the convention in BPF land). Most of > > > > > > > the > > > > > > > new > > > > > > > UAPIs working on processes are using pidfds (to work with > > > > > > > a > > > > > > > stable > > > > > > > handle instead of a reusable ID). > > > > > > > The iterator taking an fd also gives an opportunity to > > > > > > > BPF > > > > > > > LSMs > > > > > > > to > > > > > > > attach permissions/policies to it (once we have a file > > > > > > > local > > > > > > > storage > > > > > > > map) e.g. whether creating a task iterator for that > > > > > > > specific > > > > > > > pidfd > > > > > > > instance (backed by the struct file) would be allowed or > > > > > > > not. > > > > > > > You are using getpid in the selftest and keeping track of > > > > > > > last_tgid > > > > > > > in > > > > > > > the iterator, so I guess you don't even need to extend > > > > > > > pidfd_open > > > > > > > to > > > > > > > work on thread IDs right now for your use case (and > > > > > > > fdtable > > > > > > > and > > > > > > > mm > > > > > > > are > > > > > > > shared for POSIX threads anyway, so for those two it > > > > > > > won't > > > > > > > make a > > > > > > > difference). > > > > > > > > > > > > > > What is your opinion? > > > > > > > > > > > > Do you mean removed both tid and type, and replace them > > > > > > with a > > > > > > pidfd? > > > > > > We can do that in uapi, struct bpf_link_info. But, the > > > > > > interal > > > > > > types, > > > > > > ex. bpf_iter_aux_info, still need to use tid or struct file > > > > > > to > > > > > > avoid > > > > > > getting file from the per-process fdtable. Is that what > > > > > > you > > > > > > mean? > > > > > > > > > > > > > > > > Yes, just for the UAPI, it is similar to taking map_fd for > > > > > map > > > > > iter. > > > > > In bpf_link_info we should report just the tid, just like map > > > > > iter > > > > > reports map_id. > > > > > > > > It sounds good to me. > > > > > > > > One thing I need a clarification. You mentioned that a fd > 0 > > > > implies > > > > attaching to a specific task, however fd can be 0. So, it > > > > should be > > > > fd > > > > > = 0. So, it forces the user to initialize the value of pidfd > > > > > to - > > > > > 1. > > > > So, for convenience, we still need a field like 'type' to make > > > > it > > > > easy > > > > to create iterators without a filter. > > > > > > > > > > Right, but in lots of BPF UAPI fields, fd 0 means fd is unset, so > > > it > > > is fine to rely on that assumption. For e.g. even for map_fd, > > > bpf_map_elem iterator considers fd 0 to be unset. Then you don't > > > need > > > the type field. > > > > I just realize that pidfd may be meaningless for the bpf_link_info > > returned by bpf_obj_get_info_by_fd() since the origin fd might be > > closed already. So, I will always set it a value of 0. > > > > For bpf_link_info, we should only be returning the tid of the task it > is attached to, you cannot report the pidfd in bpf_link_info > correctly (as you already realised). By default this would be 0, > which is also an invalid tid, but when pidfd is set it will be the > tid of the task it is attached to, so it works well. We have a lot of dicussions around using tid or pidfd? Kumar also mentioned about removing 'type'. However, I have a feel that we need to keep 'type' in struct bpf_link_info. I cam imagine that we may like to create iterators of tasks in a cgroup or other paramters in futhure. 'type' will help us to tell the types of a parameter.