Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 10 Mar 2025 12:27:00 +0100

On Mon, Mar 10, 2025 at 09:52:31AM +0100, Christian Brauner wrote:
> On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote:
> > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> > 3. Add SO_PEERCGROUPID kselftest
> > 
> > Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> > is used which is racy.
> > 
> > As we don't add any new state to the socket itself there is no potential locking issues
> > or performance problems. We use already existing sk->sk_cgrp_data.
> > 
> > We already have analogical interfaces to retrieve this
> > information:
> > - inet_diag: INET_DIAG_CGROUP_ID
> > - eBPF: bpf_sk_cgroup_id
> > 
> > Having getsockopt() interface makes sense for many applications, because using eBPF is
> > not always an option, while inet_diag has obvious complexety and performance drawbacks
> > if we only want to get this specific info for one specific socket.
> > 
> > Idea comes from UAPI kernel group:
> > https://uapi-group.org/kernel-features/
> > 
> > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
> > and exchanging ideas about this.
> 
> Seems fine to me,
> Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>

One wider conceptual comment.

Starting with v6.15 it is possible to retrieve exit information from
pidfds even after the task has been reaped. So if someone opens a pidfd
via pidfd_open() and that task gets reaped by the parent it is possible
to call PIDFD_INFO_EXIT and you can retrieve the exit status and the
cgroupid of the task that was reaped. That works even after all task
linkage has been removed from struct pid.

The system call api doesn't allow the creation of pidfds for reaped
processes. It wouldn't be possible as the pid number will have already
been released.

Both SO_PEERPIDFD and SO_PASSPIDFD also don't allow the creation of
pidfds for already reaped peers or senders.

But that doesn't have to be the case since we always have the struct pid
available. So it's entirely possible to hand out a pidfd to a reaped
process if it's guaranteed that exit information is available. If it's
not then this would be a bug.

The trick is that when a struct pid is stashed it needs to also allocate
a pidfd inode. That could simply be done by a helper get_pidfs_pid()
which takes a reference to the struct pid and ensures that space for
recording exit information is available.

With that done SO_PEERCGROUPID isn't needed per se as it will be
possible to get the cgroupid and exit status from the pidfd.