On 10/3/2017 5:29 AM, Stephen Smalley wrote: > On Mon, 2017-10-02 at 16:56 -0700, Casey Schaufler wrote: >> On 10/2/2017 8:58 AM, Stephen Smalley wrote: >>> Provide a userspace API to unshare the selinux namespace. >>> Currently implemented via a selinuxfs node. This could be >>> coupled with unsharing of other namespaces (e.g. mount namespace, >>> network namespace) that will always be needed or left independent. >>> Don't get hung up on the interface itself, it is just to allow >>> experimentation and testing. >>> >>> Sample usage: >>> echo 1 > /sys/fs/selinux/unshare >>> unshare -m -n >>> umount /sys/fs/selinux >>> mount -t selinuxfs none /sys/fs/selinux >>> load_policy >>> getenforce >>> id >>> echo $$ >>> >>> The above will show that the process now views itself as running in >>> the >>> kernel domain in permissive mode, as would be the case at boot. >>>> From a different shell on the host system, running ps -eZ or >>> cat /proc/<pid>/attr/current will show that the process that >>> unshared its selinux namespace is still running in its original >>> context in the initial namespace, and getenforce will show the >>> the initial namespace remains enforcing. Enforcing mode or policy >>> changes in the child will not affect the parent. >>> >>> This is not yet safe; do not use on production systems. >>> Known issues include at least the following items: >>> >>> * The policy loading code has not been thoroughly audited >>> and hardened for use by unprivileged code, both with respect to >>> ensuring that the policy is internally consistent and restricting >>> the range of values used from the policy as loop bounds and memory >>> allocation sizes to sane limits. >>> >>> * The SELinux hook functions have not been modified to be >>> namespace-aware, so the hooks only perform checking against the >>> current namespace. Thus, unsharing allows the process to escape >>> confinement by the parent. Fixing this requires updating each hook >>> to >>> perform its processing on the current namespace and all of its >>> ancestors >>> up to the init namespace. >>> >>> * Some of the hook functions can be called outside of process >>> context >>> (e.g. task_kill, send_sigiotask, network input/forward) and should >>> not use >>> the current task's selinux namespace. These hooks need to be >>> updated to >>> obtain the proper selinux namespace to use instead from the caller >>> or >>> cached in a suitable data structure (e.g. the file or sock security >>> structures). >>> >>> * There are number of issues with the inode and superblock security >>> blob >>> handling for multiple namespaces, see those commits for more >>> details. >>> >>> * Only a subset of object security blobs have been updated to >>> be namespace-aware and support multiple namespaces. The ones that >>> have not yet been updated could end up performing permission checks >>> or >>> other operations on SIDs created in a different selinux namespace. >>> >>> * The network SID caches (netif, netnode, netport) have not yet >>> been instantiated per selinux namespace, unlike the AVC and SS. >>> >>> * There is no way currently to restrict or bound nesting of >>> namespaces; if you allow it to a domain in the init namespace, >>> then that domain can in turn unshare to arbitrary depths and can >>> grant the same to any domain in its own policy. Related to this >>> is the fact that there is no way to control resource usage due to >>> selinux namespaces and they can be substantial (per-namespace >>> policydb, sidtab, AVC, etc). >>> >>> * SIDs may be cached by audit and networking code and in external >>> kernel data structures and used later, potentially in a different >>> selinux namespace than the one in which the SID was originally >>> created. >> Is there a good reason that SIDs (and security contexts) need to >> be maintained separately in the namespaces? Using the same secid >> to map to a different context depending on the namespace seems like >> you're asking for trouble you don't need. A namespace that hasn't >> policy for a context/SID won't use it if it is defined for a >> different namespace, and should detect the fact if it somehow gets >> referenced, because it isn't in the policy. >> >> Do the context/SID mapping globally. Or, if you must duplicate >> contexts, >> allocate the SIDs from a single source so that they aren't ambiguous. > Yes, that may be the right answer, but it requires introducing a new > SID/context layer above the current security server layer (or changing > the latter), It could be as simple as using a global next_sid instead of attaching that to the sidtab (s->next_sid). What I'm really afraid of is an ambiguous secid being set in a secmark. In the stacking case it's going to be bad enough mapping a set of secids into a "token" without having to deal with the possibility that SELinux could give you a secid that has multiple possible mappings depending on which namespace it came from and/or which it is going into. > because the security server SID/context mappings are from > SIDs to internal struct representations of the context, where the > struct representation is policy-specific. Also, certain SIDs (e.g. the > kernel SID, the unlabeled SID, etc) are predefined for system > initialization before policy load and to support usage from policy- > independent code, but may be mapped to different context values by > different policies. So even the SID->string mappings may differ for > different policies, and hence for different namespaces. > >>> * No doubt other things I'm forgetting or haven't thought of. >>> Use at your own risk. >>> >>> Not-signed-off-by: Stephen Smalley <sds@xxxxxxxxxxxxx> >>> --- >>> security/selinux/include/classmap.h | 3 +- >>> security/selinux/selinuxfs.c | 66 >>> +++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 68 insertions(+), 1 deletion(-) >>> >>> diff --git a/security/selinux/include/classmap.h >>> b/security/selinux/include/classmap.h >>> index 35ffb29..82c8f9c 100644 >>> --- a/security/selinux/include/classmap.h >>> +++ b/security/selinux/include/classmap.h >>> @@ -39,7 +39,8 @@ struct security_class_mapping secclass_map[] = { >>> { "compute_av", "compute_create", "compute_member", >>> "check_context", "load_policy", "compute_relabel", >>> "compute_user", "setenforce", "setbool", >>> "setsecparam", >>> - "setcheckreqprot", "read_policy", "validate_trans", >>> NULL } }, >>> + "setcheckreqprot", "read_policy", "validate_trans", >>> "unshare", >>> + NULL } }, >>> { "process", >>> { "fork", "transition", "sigchld", "sigkill", >>> "sigstop", "signull", "signal", "ptrace", "getsched", >>> "setsched", >>> diff --git a/security/selinux/selinuxfs.c >>> b/security/selinux/selinuxfs.c >>> index a7e6bdb..dedb3cc9 100644 >>> --- a/security/selinux/selinuxfs.c >>> +++ b/security/selinux/selinuxfs.c >>> @@ -63,6 +63,7 @@ enum sel_inos { >>> SEL_STATUS, /* export current status using mmap() >>> */ >>> SEL_POLICY, /* allow userspace to read the in >>> kernel policy */ >>> SEL_VALIDATE_TRANS, /* compute validatetrans decision */ >>> + SEL_UNSHARE, /* unshare selinux namespace */ >>> SEL_INO_NEXT, /* The next inode number to use */ >>> }; >>> >>> @@ -321,6 +322,70 @@ static const struct file_operations >>> sel_disable_ops = { >>> .llseek = generic_file_llseek, >>> }; >>> >>> +static ssize_t sel_write_unshare(struct file *file, const char >>> __user *buf, >>> + size_t count, loff_t *ppos) >>> + >>> +{ >>> + struct selinux_fs_info *fsi = file_inode(file)->i_sb- >>>> s_fs_info; >>> + struct selinux_ns *ns = fsi->ns; >>> + char *page; >>> + ssize_t length; >>> + bool set; >>> + int rc; >>> + >>> + if (count >= PAGE_SIZE) >>> + return -ENOMEM; >>> + >>> + /* No partial writes. */ >>> + if (*ppos != 0) >>> + return -EINVAL; >>> + >>> + rc = avc_has_perm(current_selinux_ns, current_sid(), >>> + SECINITSID_SECURITY, SECCLASS_SECURITY, >>> + SECURITY__UNSHARE, NULL); >>> + if (rc) >>> + return rc; >>> + >>> + page = memdup_user_nul(buf, count); >>> + if (IS_ERR(page)) >>> + return PTR_ERR(page); >>> + >>> + length = -EINVAL; >>> + if (kstrtobool(page, &set)) >>> + goto out; >>> + >>> + if (set) { >>> + struct cred *cred = prepare_creds(); >>> + struct task_security_struct *tsec; >>> + >>> + if (!cred) { >>> + length = -ENOMEM; >>> + goto out; >>> + } >>> + tsec = cred->security; >>> + if (selinux_ns_create(ns, &tsec->ns)) { >>> + abort_creds(cred); >>> + length = -ENOMEM; >>> + goto out; >>> + } >>> + tsec->osid = tsec->sid = SECINITSID_KERNEL; >>> + tsec->exec_sid = tsec->create_sid = tsec- >>>> keycreate_sid = >>> + tsec->sockcreate_sid = SECSID_NULL; >>> + tsec->parent_cred = get_current_cred(); >>> + commit_creds(cred); >>> + } >>> + >>> + length = count; >>> +out: >>> + kfree(page); >>> + return length; >>> +} >>> + >>> +static const struct file_operations sel_unshare_ops = { >>> + .write = sel_write_unshare, >>> + .llseek = generic_file_llseek, >>> +}; >>> + >>> static ssize_t sel_read_policyvers(struct file *filp, char __user >>> *buf, >>> size_t count, loff_t *ppos) >>> { >>> @@ -1923,6 +1988,7 @@ static int sel_fill_super(struct super_block >>> *sb, void *data, int silent) >>> [SEL_POLICY] = {"policy", &sel_policy_ops, >>> S_IRUGO}, >>> [SEL_VALIDATE_TRANS] = {"validatetrans", >>> &sel_transition_ops, >>> S_IWUGO}, >>> + [SEL_UNSHARE] = {"unshare", &sel_unshare_ops, >>> 0222}, >>> /* last one */ {""} >>> }; >>> >> >> . .