[PATCH RFC 0/4] bpf: cgroup device guard for non-initial user namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Introduce the BPF_F_CGROUP_DEVICE_GUARD flag for BPF_PROG_LOAD
which allows to set a cgroup device program to be a device guard.
This may be used to guard actions on device nodes in non-initial
userns, e.g., mknod.

If a container manager restricts its unprivileged (user namespaced)
children by a device cgroup, it is not necessary to deny mknod
anymore. Thus, user space applications may map devices on different
locations in the file system by using mknod() inside the container.

A use case for this, we also use in GyroidOS, is to run virsh for
VMs inside an unprivileged container. virsh creates device nodes,
e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails
in a non-initial userns, even if a cgroup device white list with the
corresponding major, minor of /dev/null exists. Thus, in this case
the usual bind mounts or pre populated device nodes under /dev are
not sufficient.

To circumvent this limitation, we allow mknod() in the VFS if a
bpf cgroup device guard is enabled for the current task and check
CAP_MKNOD for the current user namespace instead of the init userns.

To avoid unusable device nodes on file systems mounted in
non-initial user namespace, may_open_dev() ignores the SB_I_NODEV
for cgroup device guarded tasks.

Tested for a GyroidOS container generated by the cmld using the
following user space patch: https://github.com/gyroidos/cml/pull/394

I discussed this internally with Christian in the UAPI group, earlier.
I put this to the public list now, since also LXC/LXD Folks have
announced interest on this.

This series applies to the latest mainline v6.5-rc6 tag.

Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx>
---
Michael Weiß (4):
      bpf: add cgroup device guard to flag a cgroup device prog
      bpf: provide cgroup_device_guard in bpf_prog_info to user space
      device_cgroup: wrapper for bpf cgroup device guard
      fs: allow mknod in non-initial userns using cgroup device guard

 fs/namei.c                     | 19 ++++++++++++++++---
 include/linux/bpf-cgroup.h     |  7 +++++++
 include/linux/bpf.h            |  1 +
 include/linux/device_cgroup.h  |  7 +++++++
 include/uapi/linux/bpf.h       |  8 +++++++-
 kernel/bpf/cgroup.c            | 30 ++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  6 +++++-
 security/device_cgroup.c       | 10 ++++++++++
 tools/bpf/bpftool/prog.c       |  2 ++
 tools/include/uapi/linux/bpf.h |  8 +++++++-
 10 files changed, 92 insertions(+), 6 deletions(-)
---
base-commit: 2ccdd1b13c591d306f0401d98dedc4bdcd02b421
change-id: 20230814-devcg_guard-5398ef84bf7b

Best regards,
-- 
Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux