On Thu, Jul 28, 2022 at 03:54:01PM -0700, Yonghong Song wrote: > > > On 7/28/22 10:52 AM, Kumar Kartikeya Dwivedi wrote: > > On Thu, 28 Jul 2022 at 19:08, Yonghong Song <yhs@xxxxxx> wrote: > > > > [...] > > > > > > There is one problem here. The current pidfd_open syscall > > > only supports thread-group leader, i.e., main thread, i.e., > > > it won't support any non-main-thread tid's. Yes, thread-group > > > leader and other threads should share the same vma and files > > > in most of times, but it still possible different threads > > > in the same process may have different files which is why > > > in current task_iter.c we have: > > > *tid = pid_nr_ns(pid, ns); > > > task = get_pid_task(pid, PIDTYPE_PID); > > > if (!task) { > > > ++*tid; > > > goto retry; > > > } else if (skip_if_dup_files && > > > !thread_group_leader(task) && > > > task->files == task->group_leader->files) { > > > put_task_struct(task); > > > task = NULL; > > > ++*tid; > > > goto retry; > > > } > > > > > > > > > Each thread (tid) will have some fields different from > > > thread-group leader (tgid), e.g., comm and most (if not all) > > > scheduling related fields. > > > > > > So it would be good to support for each tid from the start > > > as it is not clear when pidfd_open will support non > > > thread-group leader. > > > > I think this is just a missing feature, not a design limitation. If we > > just disable thread pifdfd from waitid and pidfd_send_signal, I think > > it is ok to use it elsewhere. > > Yes, I agree this is a missing feature. It is desirable > that the missing feature gets implemented in kernel or > at least promising work is recognized before we use pidfd > for this task structure. When I did pidfd_{open,getfd,send_signal}, CLONE_PIDFD, and the waitid support we simply didn't enable support for pidfd to refer to individual threads because there was no immediate use-case for it and we hade some concerns that I can't remember off the top of my head. Plus, we were quite happy that we got the pidfd stuff in so we limited the scope of what we wanted to do in the first iteration. But if there is a good use-case for this then by all means we should enable pidfds to refer to individual threads and I'm happy to route that upstream. But this needs to be solid work as that area of the kernel can be interesting technically and otherwise... There have been requests for this before already. I've added a wrapper pidfd_get_task() a while ago that encodes the PIDTYPE_TGID restriction. If you grep for it you should find all places that rely on pidfds (process_mrelease() and whatnot). All these places need to be able to deal with individual threads if you change that. But note, the mw is coming up.