Re: [PATCH v6 2/5] pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 28, 2024 at 04:34:33PM +0100, Christian Brauner wrote:
> On Sat, Oct 26, 2024 at 08:24:58AM +0100, Lorenzo Stoakes wrote:
> > It is useful to be able to utilise the pidfd mechanism to reference the
> > current thread or process (from a userland point of view - thread group
> > leader from the kernel's point of view).
> >
> > Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and
> > PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
> >
> > For convenience and to avoid confusion from userland's perspective we alias
> > these:
> >
> > * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what
> >   the user will want to use, as they would find it surprising if for
> >   instance fd's were unshared()'d and they wanted to invoke pidfd_getfd()
> >   and that failed.
> >
> > * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users
> >   have no concept of thread groups or what a thread group leader is, and
> >   from userland's perspective and nomenclature this is what userland
> >   considers to be a process.
> >
> > Due to the refactoring of the central __pidfd_get_pid() function we can
> > implement this functionality centrally, providing the use of this sentinel
> > in most functionality which utilises pidfd's.
> >
> > We need to explicitly adjust kernel_waitid_prepare() to permit this (though
> > it wouldn't really make sense to use this there, we provide the ability for
> > consistency).
> >
> > We explicitly disallow use of this in setns(), which would otherwise have
> > required explicit custom handling, as it doesn't make sense to set the
> > current calling thread to join the namespace of itself.
> >
> > As the callers of pidfd_get_pid() expect an increased reference count on
> > the pid we do so in the self case, reducing churn and avoiding any breakage
> > from existing logic which decrements this reference count.
> >
> > This change implicitly provides PIDFD_SELF_* support in the waitid(P_PIDFS,
> > ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and
> > pidfd_getfd() system calls.
> >
> > Things such as polling a pidfs and general fd operations are not supported,
> > this strictly provides the sentinel for APIs which explicitly accept a
> > pidfd.
> >
> > Suggested-by: Pedro Falcato <pedro.falcato@xxxxxxxxx>
> > Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> > ---
>
> Currently, a pidfd based system call like pidfd_send_signal() would
> simply do:
>
> fdget(pidfd);
> // use struct pid
> fdput(pidfd);
>
> Where the lifetime of @pid is guaranteed by @file. And in the regular
> case where there's only a single thread the file code will avoid taking
> a reference. Thus, there's no reference count bump on fdget(), nor a
> drop on fdput(), nor a get_pid() or put_pid().

Right I missed that fdget() wouldn't take a reference count I assumed it
would be equivalent, my mistake.

>
> With your patch series you will always cause reference counts on @pid to
> be taken for everyone. And I wouldn't be surprised if we get performance
> regressions for this.

This was in response to you review saying I can't pass around a pointer to
the fd, originally I didn't do this.

This was the only way I could find to de-jank and make my shared function
not end up problematic in the light of wanting to keep the fd within a
single scope, I didn't realise that passing that by value would be ok.

But obviously hadn't realised that fdget()/fdput() sometimes doesn't change
a reference count, mea culpa on that not an fs person...

>
> In one of my earlier mails I had mused about a fdput() like primitive.
> What I roughly, hastily, and under the influence of the flu, sketched in
> the _completey untested_ patch I appended illustrates roughly what I had
> been thinking about.

OK, I was really uncertain as to what you meant regarding the scope of this
value so had assumed we couldn't do something like assigning the value like
that.

I guess I'll try to adapt that and respin a v7 when I get a chance.

>
> >  include/linux/pid.h        |  8 ++++--
> >  include/uapi/linux/pidfd.h | 10 ++++++++
> >  kernel/exit.c              |  4 ++-
> >  kernel/nsproxy.c           |  1 +
> >  kernel/pid.c               | 51 ++++++++++++++++++++++++--------------
> >  5 files changed, 53 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index d466890e1b35..3b2ac7567a88 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -78,11 +78,15 @@ struct file;
> >   * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> >   *
> >   * @pidfd:      The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> > - *              @alloc_proc is also set.
> > + *              @alloc_proc is also set, or PIDFD_SELF_* to refer to the current
> > + *              thread or thread group leader.
> >   * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> >   *              of a pidfd, and this will be used to determine the pid.
> > +
> >   * @flags:      Output variable, if non-NULL, then the file->f_flags of the
> > - *              pidfd will be set here.
> > + *              pidfd will be set here or If PIDFD_SELF_THREAD is set, this is
> > + *              set to PIDFD_THREAD, otherwise if PIDFD_SELF_THREAD_GROUP then
> > + *              this is set to zero.
> >   *
> >   * Returns: If successful, the pid associated with the pidfd, otherwise an
> >   *          error.
> > diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
> > index 565fc0629fff..6fe1d63b2086 100644
> > --- a/include/uapi/linux/pidfd.h
> > +++ b/include/uapi/linux/pidfd.h
> > @@ -29,4 +29,14 @@
> >  #define PIDFD_GET_USER_NAMESPACE              _IO(PIDFS_IOCTL_MAGIC, 9)
> >  #define PIDFD_GET_UTS_NAMESPACE               _IO(PIDFS_IOCTL_MAGIC, 10)
> >
> > +/*
> > + * Special sentinel values which can be used to refer to the current thread or
> > + * thread group leader (which from a userland perspective is the process).
> > + */
> > +#define PIDFD_SELF		PIDFD_SELF_THREAD
> > +#define PIDFD_SELF_PROCESS	PIDFD_SELF_THREAD_GROUP
> > +
> > +#define PIDFD_SELF_THREAD	-10000 /* Current thread. */
> > +#define PIDFD_SELF_THREAD_GROUP	-20000 /* Current thread group leader. */
> > +
> >  #endif /* _UAPI_LINUX_PIDFD_H */
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 619f0014c33b..e4f85ec4ba78 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -71,6 +71,7 @@
> >  #include <linux/user_events.h>
> >  #include <linux/uaccess.h>
> >
> > +#include <uapi/linux/pidfd.h>
> >  #include <uapi/linux/wait.h>
> >
> >  #include <asm/unistd.h>
> > @@ -1739,7 +1740,8 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid,
> >  		break;
> >  	case P_PIDFD:
> >  		type = PIDTYPE_PID;
> > -		if (upid < 0)
> > +		if (upid < 0 && upid != PIDFD_SELF_THREAD &&
> > +		    upid != PIDFD_SELF_THREAD_GROUP)
> >  			return -EINVAL;
> >
> >  		pid = pidfd_get_pid(upid, &f_flags);
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > index dc952c3b05af..d239f7eeaa1f 100644
> > --- a/kernel/nsproxy.c
> > +++ b/kernel/nsproxy.c
> > @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
> >  	struct nsset nsset = {};
> >  	int err = 0;
> >
> > +	/* If fd is PIDFD_SELF_*, implicitly fail here, as invalid. */
> >  	if (!fd_file(f))
> >  		return -EBADF;
> >
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index 94c97559e5c5..0a1861b4422c 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -535,33 +535,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> >  }
> >  EXPORT_SYMBOL_GPL(find_ge_pid);
> >
> > +static struct pid *pidfd_get_pid_self(unsigned int pidfd, unsigned int *flags)
>
> The @flags argument is unused afaict.

Oops will rework on v7.

>
> > +{
> > +	bool is_thread = pidfd == PIDFD_SELF_THREAD;
> > +	enum pid_type type = is_thread ? PIDTYPE_PID : PIDTYPE_TGID;
> > +	struct pid *pid = *task_pid_ptr(current, type);
> > +
> > +	/* The caller expects an elevated reference count. */
> > +	get_pid(pid);
> > +	return pid;
> > +}
>
> Fwiw, what you've done here is essentially reimplement the already
> existing get_task_pid() helper that you could simply use.

We're looking up PIDFD_SELF_* values here. So presumably you mean the:

	struct pid *pid = *task_pid_ptr(current, type);
	/* The caller expects an elevated reference count. */
	get_pid(pid);

Bit is duplicated vs. get_task_pid()?

I did that because it wasn't clear doing that under the RCU lock was
necessary or useful?

It seems useful still to have the PIDFD_SELF stuff qseparate, I can replace
those two lines with a call to get_task_pid() if you prefer? Unless you
meant something else?

>
> > +
> >  struct pid *__pidfd_get_pid(unsigned int pidfd, bool allow_proc,
> >  			    unsigned int *flags)
> >  {
> > -	struct pid *pid;
> > -	struct fd f = fdget(pidfd);
> > -	struct file *file = fd_file(f);
> > +	if (pidfd == PIDFD_SELF_THREAD || pidfd == PIDFD_SELF_THREAD_GROUP) {
> > +		return pidfd_get_pid_self(pidfd, flags);
> > +	} else {
>
> I think the else can just go and we can save an indentation level.

This has been raised a couple times before by other reviewers, this is just
so we can declare variables, especially the fd variable, which you were
very clear _must_ retain scope only where it used.

Otherwise I have to do something like;

	struct fd f = {};

	if (...) { return ...; }

	f = fdget(...);

This way we don't need to do that.

I mean probably the compiler would do the right thing but it just seems
ugly to assign/reassign a stack value like that.

Ah, I see struct fd is just a wrapper around an unsigned long, so probably
not a big deal to just leave it unassigned then.

This was the only reason I did this, I usually much prefer the guard
pattern.

OK if you're fine with this value being assigned like that then no problem
will change!

>
> > +		struct pid *pid;
> > +		struct fd f = fdget(pidfd);
> > +		struct file *file = fd_file(f);
> >
> > -	if (!file)
> > -		return ERR_PTR(-EBADF);
> > +		if (!file)
> > +			return ERR_PTR(-EBADF);
> >
> > -	pid = pidfd_pid(file);
> > -	/* If we allow opening a pidfd via /proc/<pid>, do so. */
> > -	if (IS_ERR(pid) && allow_proc)
> > -		pid = tgid_pidfd_to_pid(file);
> > +		pid = pidfd_pid(file);
> > +		/* If we allow opening a pidfd via /proc/<pid>, do so. */
> > +		if (IS_ERR(pid) && allow_proc)
> > +			pid = tgid_pidfd_to_pid(file);
> >
> > -	if (IS_ERR(pid)) {
> > +		if (IS_ERR(pid)) {
> > +			fdput(f);
> > +			return pid;
> > +		}
> > +
> > +		/* Pin pid before we release fd. */
> > +		get_pid(pid);
> > +		if (flags)
> > +			*flags = file->f_flags;
> >  		fdput(f);
> > +
> >  		return pid;
> >  	}
> > -
> > -	/* Pin pid before we release fd. */
> > -	get_pid(pid);
> > -	if (flags)
> > -		*flags = file->f_flags;
> > -	fdput(f);
> > -
> > -	return pid;
> >  }
> >
> >  /**
> > --
> > 2.47.0




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux