On Wed, Nov 13, 2024 at 02:48:43PM +0100, Erin Shepherd wrote: > On 13/11/2024 14:26, Christian Brauner wrote: > > > On Wed, Nov 13, 2024 at 02:06:56PM +0100, Erin Shepherd wrote: > >> On 13/11/2024 13:09, Christian Brauner wrote: > >> > >>> Hm, a pidfd comes in two flavours: > >>> > >>> (1) thread-group leader pidfd: pidfd_open(<pid>, 0) > >>> (2) thread pidfd: pidfd_open(<pid>, PIDFD_THREAD) > >>> > >>> In your current scheme fid->pid = pid_nr(pid) means that you always > >>> encode a pidfs file handle for a thread pidfd no matter if the provided > >>> pidfd was a thread-group leader pidfd or a thread pidfd. This is very > >>> likely wrong as it means users that use a thread-group pidfd get a > >>> thread-specific pid back. > >>> > >>> I think we need to encode (1) and (2) in the pidfs file handle so users > >>> always get back the correct type of pidfd. > >>> > >>> That very likely means name_to_handle_at() needs to encode this into the > >>> pidfs file handle. > >> I guess a question here is whether a pidfd handle encodes a handle to a pid > >> in a specific mode, or just to a pid in general? The thought had occurred > >> to me while I was working on this initially, but I felt like perhaps treating > >> it as a property of the file descriptor in general was better. > >> > >> Currently open_by_handle_at always returns a thread-group pidfd (since > >> PIDFD_THREAD) isn't set, regardless of what type of pidfd you passed to > >> name_to_handle_at. I had thought that PIDFD_THREAD/O_EXCL would have been > > I don't think you're returning a thread-groupd pidfd from > > open_by_handle_at() in your scheme. After all you're encoding the tid in > > pid_nr() so you'll always find the struct pid for the thread afaict. If > > I'm wrong could you please explain how you think this works? I might > > just be missing something obvious. > > Moudlo namespaces, the pid in fid->pid is the same one passed to pidfd_open(). > In the root namespace, you could replace name_to_handle_at(...) with > pidfd_open(fid->pid, 0) and get the same result (if both are successful, at least). > > The resulting pidfd points to the same struct pid. The only thing that should differ > is whether PIDFD_THREAD is set in f->f_flags. I see what you mean but then there's another problem afaict. Two cases: (1) @pidfd_thread_group = pidfd_open(1234, 0) The pidfd_open() will succeed if the struct pid that 1234 resolves to is used as a thread-group leader. (2) @pidfd_thread = pidfd_open(5678, PIDFD_THREAD) The pidfd_open() will succeed even if the struct pid that 5678 resolves to isn't used as a thread-group leader. The resulting struct file will be marked as being a thread pidfd by raising O_EXCL. (1') If (1) is passed to name_to_handle_at() a pidfs file handle is encoded for 1234. If later open_by_hande_at() is called then by default a thread-group leader pidfd is created. This is fine (2') If (2) is passed to name_to_handle_at() a pidfs file handle is encoded for 5678. If later open_by_handle_at() is called then a thread-group leader pidfd will be created again. So in (2') the caller has managed to create a thread-group leader pidfd even though the struct pid isn't used as a thread-group leader pidfd. Consequently, that pidfd is useless when passed to any of the pidfd_*() system calls. So basically, you need to verify that if O_EXCL isn't specified with open_by_handle_at() that the struct pid that is resolved is used as a thread-group leader and if not, refuse to create a pidfd. Am I making sense? > > >> I feel like leaving it up to the caller of open_by_handle_at might be better > >> (because they are probably better informed about whether they want poll() to > >> inform them of thread or process exit) but I could lean either way. > > So in order to decode a pidfs file handle you want the caller to have to > > specify O_EXCL in the flags argument of open_by_handle_at()? Is that > > your idea? > > If they want a PIDFD_THREAD pidfd, yes. I see it as similar to O_RDONLY, where its a > flag that applies to the file descriptor but not to the underlying file. This is probably fine. > > While ideally we'd implement it from an API completeness perspective, practically I'm > not sure how often the option would ever be used. While there are hundreds of reasons > why you might want to track the state of another process, I struggle to think of cases > where Process A needs to track Process B's threads besides a debugger (and a debugger > is probably better off using ptrace), and it can happily track its own threads by just > holding onto the pidfd. We recently imlemented PIDFD_THREAD support because it is used inside Netflix. I forgot the details thought tbh. So it's actually used. We only implemented it once people requested it.