Re: [PATCH 4/4] pidfs: implement fh_to_dentry

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 13 Nov 2024 14:26:26 +0100

On Wed, Nov 13, 2024 at 02:06:56PM +0100, Erin Shepherd wrote:
> On 13/11/2024 13:09, Christian Brauner wrote:
> 
> > Hm, a pidfd comes in two flavours:
> >
> > (1) thread-group leader pidfd: pidfd_open(<pid>, 0)
> > (2) thread pidfd:              pidfd_open(<pid>, PIDFD_THREAD)
> >
> > In your current scheme fid->pid = pid_nr(pid) means that you always
> > encode a pidfs file handle for a thread pidfd no matter if the provided
> > pidfd was a thread-group leader pidfd or a thread pidfd. This is very
> > likely wrong as it means users that use a thread-group pidfd get a
> > thread-specific pid back.
> >
> > I think we need to encode (1) and (2) in the pidfs file handle so users
> > always get back the correct type of pidfd.
> >
> > That very likely means name_to_handle_at() needs to encode this into the
> > pidfs file handle.
> 
> I guess a question here is whether a pidfd handle encodes a handle to a pid
> in a specific mode, or just to a pid in general? The thought had occurred
> to me while I was working on this initially, but I felt like perhaps treating
> it as a property of the file descriptor in general was better.
> 
> Currently open_by_handle_at always returns a thread-group pidfd (since
> PIDFD_THREAD) isn't set, regardless of what type of pidfd you passed to
> name_to_handle_at. I had thought that PIDFD_THREAD/O_EXCL would have been

I don't think you're returning a thread-groupd pidfd from
open_by_handle_at() in your scheme. After all you're encoding the tid in
pid_nr() so you'll always find the struct pid for the thread afaict. If
I'm wrong could you please explain how you think this works? I might
just be missing something obvious.

> passed through to f->f_flags on the restored pidfd, but upon checking I see that
> it gets filtered out in do_dentry_open.

It does, but note that __pidfd_prepare() raises it explicitly on the
file afterwards. So it works fine.

> 
> I feel like leaving it up to the caller of open_by_handle_at might be better
> (because they are probably better informed about whether they want poll() to
> inform them of thread or process exit) but I could lean either way.

So in order to decode a pidfs file handle you want the caller to have to
specify O_EXCL in the flags argument of open_by_handle_at()? Is that
your idea?

> 
> >> +static struct dentry *pidfs_fh_to_dentry(struct super_block *sb,
> >> +					 struct fid *gen_fid,
> >> +					 int fh_len, int fh_type)
> >> +{
> >> +	int ret;
> >> +	struct path path;
> >> +	struct pidfd_fid *fid = (struct pidfd_fid *)gen_fid;
> >> +	struct pid *pid;
> >> +
> >> +	if (fh_type != FILEID_INO64_GEN || fh_len < PIDFD_FID_LEN)
> >> +		return NULL;
> >> +
> >> +	pid = find_get_pid_ns(fid->pid, &init_pid_ns);
> >> +	if (!pid || pid->ino != fid->ino || pid_vnr(pid) == 0) {
> >> +		put_pid(pid);
> >> +		return NULL;
> >> +	}
> > I think we can avoid the premature reference bump and do:
> >
> > scoped_guard(rcu) {
> >         struct pid *pid;
> >
> > 	pid = find_pid_ns(fid->pid, &init_pid_ns);
> > 	if (!pid)
> > 		return NULL;
> >
> > 	/* Did the pid get recycled? */
> > 	if (pid->ino != fid->ino)
> > 		return NULL;
> >
> > 	/* Must be resolvable in the caller's pid namespace. */
> > 	if (pid_vnr(pid) == 0)
> > 		return NULL;
> >
> > 	/* Ok, this is the pid we want. */
> > 	get_pid(pid);
> > }
> 
> I can go with that if preferred. I was worried a bit about making the RCU
> critical section too large, but of course I'm sure there are much larger
> sections inside the kernel.

This is perfectly fine. Don't worry about it.

> 
> >> +
> >> +	ret = path_from_stashed(&pid->stashed, pidfs_mnt, pid, &path);
> >> +	if (ret < 0)
> >> +		return ERR_PTR(ret);
> >> +
> >> +	mntput(path.mnt);
> >> +	return path.dentry;
> >>  }
> 
> Similarly here i should probably refactor this into dentry_from_stashed in
> order to avoid a needless bump-then-drop of path.mnt's reference count

No, what you have now is fine. I wouldn't add a specific helper for
this. In contrast to the pid the pidfs mount never goes away.