On Tue, Feb 18, 2020 at 3:35 PM Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: [...] > - Let the keyctl infrastructure only operate on kfsid which are always > mapped/looked up in the id mappings similar to what we do for > filesystems that have the same superblock visible in multiple user > namespaces. > > This version also comes with minimal tests which I intend to expand in > the future. > > From pings and off-list questions and discussions at Google Container > Security Summit there seems to be quite a lot of interest in this > patchset with use-cases ranging from layer sharing for app containers > and k8s, as well as data sharing between containers with different id > mappings. I haven't Cced all people because I don't have all the email > adresses at hand but I've at least added Phil now. :) > > This is the implementation of shiftfs which was cooked up during lunch at > Linux Plumbers 2019 the day after the container's microconference. The > idea is a design-stew from Stéphane, Aleksa, Eric, and myself (and by > now also Jann. > Back then we all were quite busy with other work and couldn't really sit > down and implement it. But I took a few days last week to do this work, > including demos and performance testing. > This implementation does not require us to touch the VFS substantially > at all. Instead, we implement shiftfs via fsid mappings. > With this patch, it took me 20 mins to port both LXD and LXC to support > shiftfs via fsid mappings. [...] Can you please grep through the kernel for all uses of ->fsuid and ->fsgid and fix them up appropriately? Some cases I still see: The SafeSetID LSM wants to enforce that you can only use CAP_SETUID to gain the privileges of a specific set of IDs: static int safesetid_task_fix_setuid(struct cred *new, const struct cred *old, int flags) { /* Do nothing if there are no setuid restrictions for our old RUID. */ if (setuid_policy_lookup(old->uid, INVALID_UID) == SIDPOL_DEFAULT) return 0; if (uid_permitted_for_cred(old, new->uid) && uid_permitted_for_cred(old, new->euid) && uid_permitted_for_cred(old, new->suid) && uid_permitted_for_cred(old, new->fsuid)) return 0; /* * Kill this process to avoid potential security vulnerabilities * that could arise from a missing whitelist entry preventing a * privileged process from dropping to a lesser-privileged one. */ force_sig(SIGKILL); return -EACCES; } This could theoretically be bypassed through setfsuid() if the kuid based on the fsuid mappings is permitted but the kuid based on the normal mappings is not. fs/coredump.c in suid dump mode uses "cred->fsuid = GLOBAL_ROOT_UID"; this should probably also fix up the other uid, even if there is no scenario in which it would actually be used at the moment? The netfilter xt_owner stuff makes packet filtering decisions based on the ->fsuid; it might be better to filter on the ->kfsuid so that you can filter traffic from different user namespaces differently? audit_log_task_info() is doing "from_kuid(&init_user_ns, cred->fsuid)".