On 09/09/2014 08:49 AM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > >> Hi Eric, >> >> On 08/30/2014 02:53 PM, Eric W. Biederman wrote: >>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>> >>>> Hello Eric et al., >>>> >>>> For various reasons, my work on the namespaces man pages >>>> fell off the table a while back. Nevertheless, the pages have >>>> been close to completion for a while now, and I recently restarted, >>>> in an effort to finish them. As you also noted to me f2f, there have >>>> been recently been some small namespace changes that you may affect >>>> the content of the pages. Therefore, I'll take the opportunity to >>>> send the namespace-related pages out for further (final?) review. >>>> >>>> So, here, I start with the user_namespaces(7) page, which is shown >>>> in rendered form below, with source attached to this mail. I'll >>>> send various other pages in follow-on mails. >>>> >>>> Review comments/suggestions for improvements / bug fixes welcome. >>>> >>>> Cheers, >>>> >>>> Michael >>>> >>>> == >>>> >>>> NAME >>>> user_namespaces - overview of Linux user_namespaces >>>> >> [...] >> >>>> When a new IPC, mount, network, PID, or UTS namespace is created >>>> via clone(2) or unshare(2), the kernel records the user namespace >>>> of the creating process against the new namespace. (This associ‐ >>>> ation can't be changed.) When a process in the new namespace >>>> subsequently performs privileged operations that operate on >>>> global resources isolated by the namespace, the permission checks >>>> are performed according to the process's capabilities in the user >>>> namespace that the kernel associated with the new namespace. >>> >>> Restrictions on mount namespaces. >>> >>> - A mount namespace has a owner user namespace. A mount namespace whose >>> owner user namespace is different than the owerner user namespace of >>> it's parent mount namespace is considered a less privileged mount >>> namespace. >>> >>> - When creating a less privileged mount namespace shared mounts are >>> reduced to slave mounts. This ensures that mappings performed in less >>> privileged mount namespaces will not propogate to more privielged >>> mount namespaces. >>> >>> - Mounts that come as a single unit from more privileged mount are >>> locked together and may not be separated in a less privielged mount >>> namespace. >> >> Could you clarify what you mean by "Mounts that come as a single >> unit"? > > unshare(CLONE_NEWNS) brings across all of the mounts from the original > mount namespace as a single unit. > > recursive mounts that propogate between mount namespaces propogate as a > single unit. Thanks, I've added those details to the page. > The importance of this is allow the global root to mount over things > and not have to worry that someone from a user namespace root can > peek underneath. > >>> - The mount flags readonly, nodev, nosuid, noexec, and the mount atime >>> settings when propogated from a more privielged to a less privileged >>> mount namespace become locked, and may not be changed in the less >>> privielged mount namespace. >>> >>> - (As of 3.18-rc1 (in todays Al Viros vfs.git#for-next tree)) A file or >>> directory that is a mountpoint in one namespace that is not a mount >>> point in another namespace, may be renamed, unlinked, or rmdired in >>> the mount namespace in which it is not a mount namespace if the >>> ordinary permission checks pass. >>> >>> Previously attemping to rmdir, unlink or rename a file or directory >>> that was a mount point in another mount namespace would result in >>> -EBUSY. This behavior had technical problems of enforcement (nfs) >>> and resulted in a nice denial of servial attack against more >>> privileged users. (Aka preventing individual files from being updated >>> by bind mounting on top of them). >> >> I have reworked the text above a little so that now we have the following. >> Aside from question above, does it look okay? >> >> Restrictions on mount namespaces >> Note the following points with respect to mount namespaces: >> >> * A mount namespace has na owner user namespace. A mount > ^s/na/an/ >> namespace whose owner user namespace is different from the >> owner user namespace of its parent mount namespace is con‐ >> sidered a less privileged mount namespace. >> >> * When creating a less privileged mount namespace, shared >> mounts are reduced to slave mounts. This ensures that map‐ >> pings performed in less privileged mount namespaces will not >> propagate to more privileged mount namespaces. >> >> * Mounts that come as a single unit from more privileged mount > ^ namespaces >> are locked together and may not be separated in a less priv‐ >> ileged mount namespace. >> >> * The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the >> "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) set‐ >> tings become locked when propagated from a more privileged >> to a less privileged mount namespace, and may not be changed >> in the less privileged mount namespace. >> >> * A file or directory that is a mount point in one namespace >> that is not a mount point in another namespace, may be >> renamed, unlinked, or removed (rmdir(2)) in the mount names‐ >> pace in which it is not a mount point (subject to the usual >> permission checks). >> >> Previously, attempting to unlink, rename, or remove a file >> or directory that was a mount point in another mount names‐ >> pace would result in the error EBUSY. That behavior had >> technical problems of enforcement (e.g., for NFS) and per‐ >> mitted denial-of-service attacks against more privileged >> users. (i.e., preventing individual files from being >> updated by bind mounting on top of them). > > Subject to tiny typo corrections that looks fine. Yup, I already found and fixed ;-). Thanks, Eric. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html