While user namespaces do not make the kernel more vulnerable, they are however used to initiate exploits. Some users do not want to block namespace creation for the entirety of the system, which some distributions provide. Instead, we needed a way to have some applications be blocked, and others allowed. This is not possible with those tools. Managing hierarchies also did not fit our case because we're determining which tasks are allowed based on their attributes. While exploring a solution, we first leveraged the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns(). The calls look something like this: cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred) We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors. The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2] Additionally, cred_prepare hook is not without problems. Handling the clone3 case is a bit more tricky due to the user space pointer passed to it. This makes checking the syscall subject to a possible TOCTTOU attack. Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3] This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF. The following patches after include a BPF test and a patch for an SELinux implementation. We want to encourage use of user namespaces, and also cater the needs of users/administrators to observe and/or control access. There is no expectation of an impact on user space applications because access control is opt-in, and users wishing to observe within a LSM context Links: 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@xxxxxxxxxxxxxx/ 2. https://lore.kernel.org/all/87y1xzyhub.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@xxxxxxxxxxxxxx/ Past discussions: V4: https://lore.kernel.org/all/20220801180146.1157914-1-fred@xxxxxxxxxxxxxx/ V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@xxxxxxxxxxxxxx/ V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@xxxxxxxxxxxxxx/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@xxxxxxxxxxxxxx/ Changes since v4: - Update commit description - Update cover letter Changes since v3: - Explicitly set CAP_SYS_ADMIN to test namespace is created given permission - Simplify BPF test to use sleepable hook only - Prefer unshare() over clone() for tests Changes since v2: - Rename create_user_ns hook to userns_create - Use user_namespace as an object opposed to a generic namespace object - s/domB_t/domA_t in commit message Changes since v1: - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch - Add selinux: Implement create_user_ns hook patch - Change function signature of security_create_user_ns() to only take struct cred - Move security_create_user_ns() call after id mapping check in create_user_ns() - Update documentation to reflect changes Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 + security/security.c | 5 + security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++ 10 files changed, 168 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c -- 2.30.2