"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > Ping! > > Cheers, > > Michael > > > On 2 February 2015 at 16:37, Michael Kerrisk (man-pages) > <mtk.manpages@xxxxxxxxx> wrote: >> Hi Eric, >> >> Thanks for writing this up! >> >> On 12/12/2014 10:54 PM, Eric W. Biederman wrote: >>> >>> Files with access permissions such as ---rwx---rwx give fewer >>> permissions to their group then they do to everyone else. Which means >>> dropping groups with setgroups(0, NULL) actually grants a process >>> privileges. >>> >>> The uprivileged setting of gid_map turned out not to be safe after ^^^^^^^^^^^ unprivileged -- typo fix >>> this change. Privilege setting of gid_map can be interpreted as >>> meaning yes it is ok to drop groups. >> >> I had trouble to parse that sentence (and I'd like to make sure that >> the right sentence ends up in the commit message). Did you mean: >> >> "*Unprivileged* setting of gid_map can be interpreted as meaning >> yes it is ok to drop groups" >> ? >> >> Or something else? I meant: Setting of gid_map with privilege has been clarified to mean that dropping groups is ok. This allows existing programs that set gid_map with privilege to work without changes. That is newgidmap continues to work unchanged. >>> To prevent this problem and future problems user namespaces were >>> changed in such a way as to guarantee a user can not obtain >>> credentials without privilege they could not obtain without the >>> help of user namespaces. >>> >>> This meant testing the effective user ID and not the filesystem user >>> ID as setresuid and setregid allow setting any process uid or gid >>> (except the supplemental groups) to the effective ID. >>> >>> Furthermore to preserve in some form the useful applications that have >>> been setting gid_map without privilege the file /proc/[pid]/setgroups >>> was added to allow disabling setgroups. With the setgroups system >>> call permanently disabled in a user namespace it again becomes safe to >>> allow writes to gid_map without privilege. >>> >>> Here is my meager attempt to update user_namespaces.7 to reflect these >>> issues. >> >> It looked pretty serviceable as patch, IMO. So, thanks again. I've applied, >> tweaking some wordings afterward, but changing nothing essential. See below >> for a question. >> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> --- >>> man7/user_namespaces.7 | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 49 insertions(+), 3 deletions(-) >>> >>> diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 >>> index d76721d9a0a1..f8333a762308 100644 >>> --- a/man7/user_namespaces.7 >>> +++ b/man7/user_namespaces.7 >>> @@ -533,11 +533,16 @@ One of the following is true: >>> The data written to >>> .I uid_map >>> .RI ( gid_map ) >>> -consists of a single line that maps the writing process's filesystem user ID >>> +consists of a single line that maps the writing process's effective user ID >>> (group ID) in the parent user namespace to a user ID (group ID) >>> in the user namespace. >>> -The usual case here is that this single line provides a mapping for user ID >>> -of the process that created the namespace. >>> +The writing process must have the same effective user ID as the process >>> +that created the user namespace. >>> +In the case of >>> +.I gid_map >>> +the >>> +.I setgroups >>> +file must have been written to earlier and disabled the setgroups system call. >>> .IP * 3 >>> The opening process has the >>> .BR CAP_SETUID >>> @@ -552,6 +557,47 @@ Writes that violate the above rules fail with the error >>> .\" >>> .\" ============================================================ >>> .\" >>> +.SS Interaction with system calls that change the uid or gid values >>> +When in a user namespace where the >>> +.I uid_map >>> +or >>> +.I gid_map >>> +file has not been written the system calls that change user IDs >>> +or group IDs respectively will fail. After the >>> +.I uid_map >>> +and >>> +.I gid_map >>> +file have been written only the mapped values may be used in >>> +system calls that change user IDs and group IDs. >>> + >>> +For user IDs these system calls include >>> +.BR setuid , >>> +.BR setfsuid , >>> +.BR setreuid , >>> +and >>> +.BR setresuid . >>> + >>> +For group IDs these system calls include >>> +.BR setgid , >>> +.BR setfsgid , >>> +.BR setregid , >>> +.BR setresgid , >>> +and >>> +.BR setgroups. >>> + >>> +Writing >>> +.BR deny >>> +to the >>> +.I /proc/[pid]/setgroups >>> +file before writing to >>> +.I /proc/[pid]/gid_map >>> +will permanently disable the setgroups system call in a user namespace >>> +and allow writing to >>> +.I /proc/[pid]/gid_map >>> +without >>> +.BR CAP_SETGID >>> +in the parent user namespace. >> >> I just want to double check: you really did mean to write "*parent* namespace" >> above, right? Yes. At this point only privilege in the *parent* user namespace is meaningful, as applications in the new user namespace have all privileges. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers