Note: this patch is a POC and RFC. It "parasites" on pidfd code just for the sake of a demo. It has nothing to do with pidfd. The problem: If you use suid/sgid bits to switch to a less-privileged (home-less) user, then the group list can't be changed, effectively nullifying any supposed restrictions. As such, suid/sgid to non-root creds is currently practically useless. Previous solutions: https://www.spinics.net/lists/kernel/msg5383847.html This solution allows to restrict the groups from group list. It failed to get any attention for probably being too ad-hoc. https://lore.kernel.org/all/0895c1f268bc0b01cc6c8ed4607d7c3953f49728.1416041823.git.josh@xxxxxxxxxxxxxxxx/ This solution from Josh Tripplett was considered insecure. New proposal: This proposal was inspired by the credfd proposal of Andy Lutomirski: https://lkml2.uits.iu.edu/hypermail/linux/kernel/1403.3/01528.html When we send an fd with SCM_RIGHTS, is has entire creds of the sender, captured at a moment of opening the file. Now if we have a "capable" server process, it can do SO_PEERCRED to retrieve client's uid/gid. Then it does getgrouplist() and setgroups() with client's uid/gid to set the group list desired for that client. Then it sets euid/egid to match client's. Then it opens some file (pidfd file in this POC, but should be credfd) and sends it to client. Client then does a special ioctl() on that fd to actually set up the received group list. Such ioctl() must ensure that the change is safe: - If process has CAP_SETGID - ok - Otherwise we need to make sure the server process explicitly permitted the change (not in this POC), make sure that uid==euid==suid (i.e. the process won't change its creds after setting group list) and make sure that euid/egid match those of the server. After doing these checks, the group list is applied. Simply put, this proposal allows to move CAP_SETGID from the main process to the helper (server) process, keeping the main process cap-less. Its advantage over the previous proposals is that you end up with the _correct_ group list that _naturally_ belongs to that UID. Previous proposals either ended up with an empty group list or "restricted" group list, but never with the right one. I put the user-space usage example here: https://github.com/stsp/cred_test Would be good to hear if something like this can be considered. Signed-off-by: Stas Sergeev <stsp2@xxxxxxxxx> CC: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> CC: Christian Brauner <brauner@xxxxxxxxxx> CC: Jan Kara <jack@xxxxxxx> CC: Jens Axboe <axboe@xxxxxxxxx> CC: Kees Cook <kees@xxxxxxxxxx> CC: Oleg Nesterov <oleg@xxxxxxxxxx> CC: linux-fsdevel@xxxxxxxxxxxxxxx CC: linux-kernel@xxxxxxxxxxxxxxx CC: Eric Biederman <ebiederm@xxxxxxxxxxxx> CC: Andy Lutomirski <luto@xxxxxxxxxx> CC: Josh Triplett <josh@xxxxxxxxxxxxxxxx> --- fs/pidfs.c | 31 +++++++++++++++++++++++++++++++ include/linux/cred.h | 4 ++++ include/uapi/linux/pidfd.h | 1 + 3 files changed, 36 insertions(+) diff --git a/fs/pidfs.c b/fs/pidfs.c index 80675b6bf884..06209d3b5e61 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -114,6 +114,28 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) return poll_flags; } +static bool can_borrow_groups(const struct cred *cred) +{ + kuid_t uid = current_uid(); + kgid_t gid = current_gid(); + kuid_t euid = current_euid(); + kgid_t egid = current_egid(); + + if (may_setgroups()) + return 1; + /* TODO: make sure peer actually allowed to borrow his groups. */ + + /* Make sure the process can't switch uid/gid. */ + if (!uid_eq(euid, uid) || !uid_eq(current_suid(), uid)) + return 0; + if (!gid_eq(egid, gid) || !gid_eq(current_sgid(), gid)) + return 0; + /* Make sure the euid/egid of 2 processes are equal. */ + if (!uid_eq(cred->euid, euid) || !gid_eq(cred->egid, egid)) + return 0; + return 1; +} + static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct task_struct *task __free(put_task) = NULL; @@ -141,8 +163,10 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) * We're trying to open a file descriptor to the namespace so perform a * filesystem cred ptrace check. Also, we mirror nsfs behavior. */ +/* if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) return -EACCES; +*/ switch (cmd) { /* Namespaces that hang of nsproxy. */ @@ -209,6 +233,13 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg) rcu_read_unlock(); } break; + case PIDFD_BORROW_GROUPS: + if (task == current) + return 0; + if (!can_borrow_groups(file->f_cred)) + return -EPERM; + set_current_groups(file->f_cred->group_info); + return 0; default: return -ENOIOCTLCMD; } diff --git a/include/linux/cred.h b/include/linux/cred.h index 2976f534a7a3..cfdeebbd7db6 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -83,6 +83,10 @@ static inline int groups_search(const struct group_info *group_info, kgid_t grp) { return 1; } +static inline bool may_setgroups(void) +{ + return 1; +} #endif /* diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h index 565fc0629fff..1ef8e31fefed 100644 --- a/include/uapi/linux/pidfd.h +++ b/include/uapi/linux/pidfd.h @@ -28,5 +28,6 @@ #define PIDFD_GET_TIME_FOR_CHILDREN_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 8) #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10) +#define PIDFD_BORROW_GROUPS _IO(PIDFS_IOCTL_MAGIC, 11) #endif /* _UAPI_LINUX_PIDFD_H */ -- 2.47.0