Re: [PATCH RFC 06/10] pidfs: allow to retrieve exit information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 02, 2025 at 04:53:46PM +0100, Oleg Nesterov wrote:
> On 02/28, Christian Brauner wrote:
> >
> > Some tools like systemd's jounral need to retrieve the exit and cgroup
> > information after a process has already been reaped.
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> But unless I am totally confused do_exit() calls pidfd_exit() even
> before exit_notify(), the exiting task is not even zombie yet. It
> will reaped only when it passes exit_notify() and its parent does
> wait().

The overall goal is that it's possible to retrieve exit status and
cgroupid even if the task has already been reaped.

It's intentionally placed before exit_notify(), i.e., before the task is
a zombie because exit_notify() wakes pidfd-pollers. Ideally, pidfd
pollers would be woken and then could use the PIDFD_GET_INFO ioctl to
retrieve the exit status.

It would however be fine to place it into exit_notify() if it's a better
fit there. If you have a preference let me know.

I don't see a reason why seeing the exit status before that would be an
issue. The only downside would be that some other task that just keeps
ioctl()ing in a loop would possible see the exit status before the
parent does. But I didn't think this would be a big issue.

> And what about the multi-threaded case? Suppose the main thread
> does sys_exit(0) and it has alive sub-threads.
> 
> In this case pidfd_info() will report kinfo.exit_code = 0.
> And this is probably fine if (file->f_flags & PIDFD_THREAD) != 0.

Yes.

> But what if this file was created without PIDFD_THREAD? If another
> thread does exit_group(1) after that, the process's exit code is
> 1 << 8, but it can't be retrieved.

Yes, I had raised that in an off-list discussion about this as well and
was unsure what the cleanest way of dealing with this would be.

My initial approach had been to not just have:

struct pidfs_exit_info {
        __u64 cgroupid;
        __s32 exit_code;
};


but to have:

struct pidfs_exit_info {
        __u64 cgroupid;
        __s32 exit_code;
	__u64 tg_cgroupid;
	__s32 tg_exit_code;
};

so that it would be possible to retrieve either depending on the type of
pidfd. Is that feasible?

> Finally, sys_execve(). Suppose we have a main thread L and a
> sub-thread T.
> 
> T execs and kill the leader L. L exits and populates
> pidfs_i(inode)->exit_info.
> 
> T calls exchange_tids() in de_thread() and becomes the new leader
> with the same (old) pid.
> 
> Now, T is very much alive, but pidfs_i(inode)->exit_info != NULL.

Yes, de_thread() is a good point. That nasty wrinkly I had really
ignored^wforgotten. We should not report an exit status in this case. I
think that's what you're agreeing with as well?

> Or I am totally confused?

No, you're right!

What's the best way of handling the de_thread() case? Would moving this
into exit_notify() be enough where we also handle
PIDFD_THREAD/~PIDFD_THREAD waking?

Thanks for the review!

> 
> 
> 
> > +	exit_info = READ_ONCE(pidfs_i(inode)->exit_info);
> > +	if (exit_info) {
> > +		/*
> > +		 * TODO: Oleg, I didn't see a reason for putting
> > +		 * retrieval of the exit status of a task behind some
> > +		 * form of permission check.
> 
> Neither me.
> 
> Oleg.
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux