Re: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 07, 2023 at 06:21:18PM +0100, Christian Brauner wrote:
> [Cc fsdevel & Jan because we had some discussions about fanotify
> returning non-thread-group pidfds. That's just for awareness or in case
> this might need special handling.]
> 
> On Thu, Nov 30, 2023 at 09:39:44AM -0700, Tycho Andersen wrote:
> > From: Tycho Andersen <tandersen@xxxxxxxxxxx>
> > 
> > We are using the pidfd family of syscalls with the seccomp userspace
> > notifier. When some thread triggers a seccomp notification, we want to do
> > some things to its context (munge fd tables via pidfd_getfd(), maybe write
> > to its memory, etc.). However, threads created with ~CLONE_FILES or
> > ~CLONE_VM mean that we can't use the pidfd family of syscalls for this
> > purpose, since their fd table or mm are distinct from the thread group
> > leader's. In this patch, we relax this restriction for pidfd_open().
> > 
> > In order to avoid dangling poll() users we need to notify pidfd waiters
> > when individual threads die, but once we do that all the other machinery
> > seems to work ok viz. the tests. But I suppose there are more cases than
> > just this one.
> > 
> > Another weirdness is the open-coding of this vs. exporting using
> > do_notify_pidfd(). This particular location is after __exit_signal() is
> > called, which does __unhash_process() which kills ->thread_pid, so we need
> > to use the copy we have locally, vs do_notify_pid() which accesses it via
> > task_pid(). Maybe this suggests that the notification should live somewhere
> > in __exit_signals()? I just put it here because I saw we were already
> > testing if this task was the leader.
> > 
> > Signed-off-by: Tycho Andersen <tandersen@xxxxxxxxxxx>
> > ---
> 
> So we've always said that if there's a use-case for this then we're
> willing to support it. And I think that stance hasn't changed. I know
> that others have expressed interest in this as well.
> 
> So currently the series only enables pidfds for threads to be created
> and allows notifications for threads. But all places that currently make
> use of pidfds refuse non-thread-group leaders. We can certainly proceed
> with a patch series that only enables creation and exit notification but
> we should also consider unlocking additional functionality:
> 
> * audit of all callers that use pidfd_get_task()
> 
>   (1) process_madvise()
>   (2) process_mrlease()
> 
>   I expect that both can handle threads just fine but we'd need an Ack
>   from mm people.
> 
> * pidfd_prepare() is used to create pidfds for:
> 
>   (1) CLONE_PIDFD via clone() and clone3()
>   (2) SCM_PIDFD and SO_PEERPIDFD
>   (3) fanotify
>   
>   (1) is what this series here is about.
> 
>   For (2) we need to check whether fanotify would be ok to handle pidfds
>   for threads. It might be fine but Jan will probably know more.
> 
>   For (3) the change doesn't matter because SCM_CREDS always use the
>   thread-group leader. So even if we allowed the creation of pidfds for
>   threads it wouldn't matter.
> * audit all callers of pidfd_pid() whether they could simply be switched
>   to handle individual threads:
> 
>   (1) setns() handles threads just fine so this is safe to allow.
>   (2) pidfd_getfd() I would like to keep restricted and essentially
>       freeze new features for it.
> 
>       I'm not happy that we did didn't just implement it as an ioctl to
>       the seccomp notifier. And I wouldn't oppose a patch that would add
>       that functionality to the seccomp notifier itself. But that's a
>       separate topic.
>   (3) pidfd_send_signal(). I think that one is the most interesting on
>       to allow signaling individual threads. I'm not sure that you need
>       to do this right now in this patch but we need to think about what
>       we want to do there.

This all sounds reasonable to me, I can take a look as time permits.

pidfd_send_signal() at the very least would have been useful while
writing these tests.

Tycho




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux