"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > On 08/30/2014 02:53 PM, Eric W. Biederman wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >> >>> Hello Eric et al., >>> >>> For various reasons, my work on the namespaces man pages >>> fell off the table a while back. Nevertheless, the pages have >>> been close to completion for a while now, and I recently restarted, >>> in an effort to finish them. As you also noted to me f2f, there have >>> been recently been some small namespace changes that you may affect >>> the content of the pages. Therefore, I'll take the opportunity to >>> send the namespace-related pages out for further (final?) review. >>> >>> So, here, I start with the user_namespaces(7) page, which is shown >>> in rendered form below, with source attached to this mail. I'll >>> send various other pages in follow-on mails. >>> >>> Review comments/suggestions for improvements / bug fixes welcome. >>> >>> Cheers, >>> >>> Michael >>> >>> == >>> >>> NAME >>> user_namespaces - overview of Linux user_namespaces >>> > [...] > >>> When a new IPC, mount, network, PID, or UTS namespace is created >>> via clone(2) or unshare(2), the kernel records the user namespace >>> of the creating process against the new namespace. (This associ‐ >>> ation can't be changed.) When a process in the new namespace >>> subsequently performs privileged operations that operate on >>> global resources isolated by the namespace, the permission checks >>> are performed according to the process's capabilities in the user >>> namespace that the kernel associated with the new namespace. >> >> Restrictions on mount namespaces. >> >> - A mount namespace has a owner user namespace. A mount namespace whose >> owner user namespace is different than the owerner user namespace of >> it's parent mount namespace is considered a less privileged mount >> namespace. >> >> - When creating a less privileged mount namespace shared mounts are >> reduced to slave mounts. This ensures that mappings performed in less >> privileged mount namespaces will not propogate to more privielged >> mount namespaces. >> >> - Mounts that come as a single unit from more privileged mount are >> locked together and may not be separated in a less privielged mount >> namespace. > > Could you clarify what you mean by "Mounts that come as a single > unit"? unshare(CLONE_NEWNS) brings across all of the mounts from the original mount namespace as a single unit. recursive mounts that propogate between mount namespaces propogate as a single unit. The importance of this is allow the global root to mount over things and not have to worry that someone from a user namespace root can peek underneath. >> - The mount flags readonly, nodev, nosuid, noexec, and the mount atime >> settings when propogated from a more privielged to a less privileged >> mount namespace become locked, and may not be changed in the less >> privielged mount namespace. >> >> - (As of 3.18-rc1 (in todays Al Viros vfs.git#for-next tree)) A file or >> directory that is a mountpoint in one namespace that is not a mount >> point in another namespace, may be renamed, unlinked, or rmdired in >> the mount namespace in which it is not a mount namespace if the >> ordinary permission checks pass. >> >> Previously attemping to rmdir, unlink or rename a file or directory >> that was a mount point in another mount namespace would result in >> -EBUSY. This behavior had technical problems of enforcement (nfs) >> and resulted in a nice denial of servial attack against more >> privileged users. (Aka preventing individual files from being updated >> by bind mounting on top of them). > > I have reworked the text above a little so that now we have the following. > Aside from question above, does it look okay? > > Restrictions on mount namespaces > Note the following points with respect to mount namespaces: > > * A mount namespace has na owner user namespace. A mount ^s/na/an/ > namespace whose owner user namespace is different from the > owner user namespace of its parent mount namespace is con‐ > sidered a less privileged mount namespace. > > * When creating a less privileged mount namespace, shared > mounts are reduced to slave mounts. This ensures that map‐ > pings performed in less privileged mount namespaces will not > propagate to more privileged mount namespaces. > > * Mounts that come as a single unit from more privileged mount ^ namespaces > are locked together and may not be separated in a less priv‐ > ileged mount namespace. > > * The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the > "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) set‐ > tings become locked when propagated from a more privileged > to a less privileged mount namespace, and may not be changed > in the less privileged mount namespace. > > * A file or directory that is a mount point in one namespace > that is not a mount point in another namespace, may be > renamed, unlinked, or removed (rmdir(2)) in the mount names‐ > pace in which it is not a mount point (subject to the usual > permission checks). > > Previously, attempting to unlink, rename, or remove a file > or directory that was a mount point in another mount names‐ > pace would result in the error EBUSY. That behavior had > technical problems of enforcement (e.g., for NFS) and per‐ > mitted denial-of-service attacks against more privileged > users. (i.e., preventing individual files from being > updated by bind mounting on top of them). Subject to tiny typo corrections that looks fine. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers