On Thu, Dec 21, 2017 at 10:44 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > No. This makes no logical sense. > > A task that enters a user namespace loses all capabilities to everything > outside of the user namespace. Capabilities inside a user namespace are > only valid for objects created inside that user namespace. > > So limiting capabilities inside a user namespace when the capability > bounding set is already fully honored by not giving the processes any of > those capabilities makes no logical sense. > > If the concern is kernel attack surface versus logical permissions we > can look at ways to reduce the attack surface but that needs to be fully > discussed in the change log. Here's an example of using user namespaces to read a file you shouldn't be able to. lpk19:~# uname -r 4.15.0-smp-d1ce8ceb8ba8 (we start as true global root) lpk19:~# id uid=0(root) gid=0(root) groups=0(root) (cleanup after previous run) lpk19:~# cd /; chattr -i /immu; rm -f /immu/log; rmdir /immu (now we create an append only logfile owned by target user:group) lpk19:~# cd /; mkdir /immu; touch /immu/log; chown produser:prod /immu/log; chmod a-rwx,u+w /immu/log; chattr +a /immu/log (let's show what things look like) lpk19:~# chattr +i /immu; ls -ld / /immu /immu/log; lsattr -d / /immu /immu/log drwxr-xr-x 22 root root 4096 Dec 21 16:33 / drwxr-xr-x 2 root root 4096 Dec 21 16:23 /immu --w------- 1 produser prod 0 Dec 21 16:23 /immu/log -----------I--e---- / ----i---------e---- /immu -----a--------e---- /immu/log (the immutable bit prevents us from changing permissions on the file) lpk19:/# chmod a+rwx /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (the append only bit prevents us from simply overwriting the file) lpk19:/# echo log1 > /immu/log -bash: /immu/log: Operation not permitted (but we can append to it) lpk19:/# echo log1 >> /immu/log (we're global root with CAP_DAC_OVERRIDE, so we can *still* read it) lpk19:/# cat /immu/log log1 (let's transition to target user) lpk19:/# su - produser produser@lpk19:~$ id uid=2080(produser) gid=620(prod) groups=620(prod) (we can't overwrite it) produser@lpk19:~$ echo log2 > /immu/log -su: /immu/log: Operation not permitted (but we can log to it: as intended) produser@lpk19:~$ echo log2 >> /immu/log (we can't change its permissions, cause it's in an immutable directory) produser@lpk19:~$ chmod u+r /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (we can't dump the file, cause we don't have CAP_DAC_OVERRIDE) produser@lpk19:~$ cat /immu/log cat: /immu/log: Permission denied (or can we?) produser@lpk19:~$ unshare -U -r cat /immu/log log1 log2 ---- Now, of course, the above patch doesn't actually fix this on it's own, since 'su' doesn't (yet?) know to restrict bset or to set no_new_privs. But: it allows the sandbox equivalent of su to drop CAP_DAC_OVERRIDE from it's inh/eff/perm/ambient/bset, and set no_new_privs. Now the unshare won't gain CAP_DAC_OVERRIDE and won't be able to cat the non-readable append-only log file. IMHO the point of having a capability bounding set and/or no_new_privs is to never be able to regain capabilities. Note also that 'no_new_privs' isn't cleared across a unshare(CLONE_NEWUSER) [presumably also applies to setns()]. We can of course argue the implementation details (for example instead of using the existing no_new_privs flag, add a new keep_bset_across_userns_transitions securebits flag)... but *something* has to be done. - Maciej _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers