On Thu, Dec 07, 2023 at 06:21:18PM +0100, Christian Brauner wrote: > [Cc fsdevel & Jan because we had some discussions about fanotify > returning non-thread-group pidfds. That's just for awareness or in case > this might need special handling.] > > On Thu, Nov 30, 2023 at 09:39:44AM -0700, Tycho Andersen wrote: > > From: Tycho Andersen <tandersen@xxxxxxxxxxx> > > > > We are using the pidfd family of syscalls with the seccomp userspace > > notifier. When some thread triggers a seccomp notification, we want to do > > some things to its context (munge fd tables via pidfd_getfd(), maybe write > > to its memory, etc.). However, threads created with ~CLONE_FILES or > > ~CLONE_VM mean that we can't use the pidfd family of syscalls for this > > purpose, since their fd table or mm are distinct from the thread group > > leader's. In this patch, we relax this restriction for pidfd_open(). > > > > In order to avoid dangling poll() users we need to notify pidfd waiters > > when individual threads die, but once we do that all the other machinery > > seems to work ok viz. the tests. But I suppose there are more cases than > > just this one. > > > > Another weirdness is the open-coding of this vs. exporting using > > do_notify_pidfd(). This particular location is after __exit_signal() is > > called, which does __unhash_process() which kills ->thread_pid, so we need > > to use the copy we have locally, vs do_notify_pid() which accesses it via > > task_pid(). Maybe this suggests that the notification should live somewhere > > in __exit_signals()? I just put it here because I saw we were already > > testing if this task was the leader. > > > > Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx> > > --- > > So we've always said that if there's a use-case for this then we're > willing to support it. And I think that stance hasn't changed. I know > that others have expressed interest in this as well. > > So currently the series only enables pidfds for threads to be created > and allows notifications for threads. But all places that currently make > use of pidfds refuse non-thread-group leaders. We can certainly proceed > with a patch series that only enables creation and exit notification but > we should also consider unlocking additional functionality: > > * audit of all callers that use pidfd_get_task() > > (1) process_madvise() > (2) process_mrlease() > > I expect that both can handle threads just fine but we'd need an Ack > from mm people. > > * pidfd_prepare() is used to create pidfds for: > > (1) CLONE_PIDFD via clone() and clone3() > (2) SCM_PIDFD and SO_PEERPIDFD > (3) fanotify > > (1) is what this series here is about. > > For (2) we need to check whether fanotify would be ok to handle pidfds > for threads. It might be fine but Jan will probably know more. > > For (3) the change doesn't matter because SCM_CREDS always use the > thread-group leader. So even if we allowed the creation of pidfds for > threads it wouldn't matter. > * audit all callers of pidfd_pid() whether they could simply be switched > to handle individual threads: > > (1) setns() handles threads just fine so this is safe to allow. > (2) pidfd_getfd() I would like to keep restricted and essentially > freeze new features for it. > > I'm not happy that we did didn't just implement it as an ioctl to > the seccomp notifier. And I wouldn't oppose a patch that would add > that functionality to the seccomp notifier itself. But that's a > separate topic. > (3) pidfd_send_signal(). I think that one is the most interesting on > to allow signaling individual threads. I'm not sure that you need > to do this right now in this patch but we need to think about what > we want to do there. This all sounds reasonable to me, I can take a look as time permits. pidfd_send_signal() at the very least would have been useful while writing these tests. Tycho