Introduce the flag BPF_DEVCG_ACC_MKNOD_UNS for bpf programs of type BPF_PROG_TYPE_CGROUP_DEVICE which allows to guard access to mknod in non-initial user namespaces. If a container manager restricts its unprivileged (user namespaced) children by a device cgroup, it is not necessary to deny mknod() anymore. Thus, user space applications may map devices on different locations in the file system by using mknod() inside the container. A use case for this, we also use in GyroidOS, is to run virsh for VMs inside an unprivileged container. virsh creates device nodes, e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails in a non-initial userns, even if a cgroup device white list with the corresponding major, minor of /dev/null exists. Thus, in this case the usual bind mounts or pre populated device nodes under /dev are not sufficient. To circumvent this limitation, allow mknod() by checking CAP_MKNOD in the userns by implementing the security_inode_mknod_nscap(). The hook implementation checks if the corresponding permission flag BPF_DEVCG_ACC_MKNOD_UNS is set for the device in the bpf program. To avoid to create unusable inodes in user space the hook also checks SB_I_NODEV on the corresponding super block. Further, the security_sb_alloc_userns() hook is implemented using cgroup_bpf_current_enabled() to allow usage of device nodes on super blocks mounted by a guarded task. Patch 1 to 3 rework the current devcgroup_inode hooks as an LSM Patch 4 to 8 rework explicit calls to devcgroup_check_permission also as LSM hooks and finalize the conversion of the device_cgroup subsystem to a LSM. Patch 9 and 10 introduce new generic security hooks to be used for the actual mknod device guard implementation. Patch 11 wires up the security hooks in the vfs Patch 12 and 13 provide helper functions in the bpf cgroup subsystem. Patch 14 finally implement the LSM hooks to grand access Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx> --- Changes in v2: - Integrate this as LSM (Christian, Paul) - Switched to a device cgroup specific flag instead of a generic bpf program flag (Christian) - do not ignore SB_I_NODEV in fs/namei.c but use LSM hook in sb_alloc_super in fs/super.c - Link to v1: https://lore.kernel.org/r/20230814-devcg_guard-v1-0-654971ab88b1@xxxxxxxxxxxxxxxxxxx Michael Weiß (14): device_cgroup: Implement devcgroup hooks as lsm security hooks vfs: Remove explicit devcgroup_inode calls device_cgroup: Remove explicit devcgroup_inode hooks lsm: Add security_dev_permission() hook device_cgroup: Implement dev_permission() hook block: Switch from devcgroup_check_permission to security hook drm/amdkfd: Switch from devcgroup_check_permission to security hook device_cgroup: Hide devcgroup functionality completely in lsm lsm: Add security_inode_mknod_nscap() hook lsm: Add security_sb_alloc_userns() hook vfs: Wire up security hooks for lsm-based device guard in userns bpf: Add flag BPF_DEVCG_ACC_MKNOD_UNS for device access bpf: cgroup: Introduce helper cgroup_bpf_current_enabled() device_cgroup: Allow mknod in non-initial userns if guarded block/bdev.c | 9 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +- fs/namei.c | 24 ++-- fs/super.c | 6 +- include/linux/bpf-cgroup.h | 2 + include/linux/device_cgroup.h | 67 ----------- include/linux/lsm_hook_defs.h | 4 + include/linux/security.h | 18 +++ include/uapi/linux/bpf.h | 1 + init/Kconfig | 4 + kernel/bpf/cgroup.c | 14 +++ security/Kconfig | 1 + security/Makefile | 2 +- security/device_cgroup/Kconfig | 7 ++ security/device_cgroup/Makefile | 4 + security/{ => device_cgroup}/device_cgroup.c | 3 +- security/device_cgroup/device_cgroup.h | 20 ++++ security/device_cgroup/lsm.c | 114 +++++++++++++++++++ security/security.c | 75 ++++++++++++ 19 files changed, 294 insertions(+), 88 deletions(-) delete mode 100644 include/linux/device_cgroup.h create mode 100644 security/device_cgroup/Kconfig create mode 100644 security/device_cgroup/Makefile rename security/{ => device_cgroup}/device_cgroup.c (99%) create mode 100644 security/device_cgroup/device_cgroup.h create mode 100644 security/device_cgroup/lsm.c base-commit: 58720809f52779dc0f08e53e54b014209d13eebb -- 2.30.2