Michael Kerrisk <mtk.manpages@xxxxxxxxx> writes: > For a long time, this manual page has had a brief discussion of > "locked" mounts, without clearly saying what this concept is, or > why it exists. Expand the discussion with an explanation of what > locked mounts are, why mounts are locked, and some examples of the > effect of locking. > > Thanks to Christian Brauner for a lot of help in understanding > these details. > > Reported-by: Christian Brauner <christian.brauner@xxxxxxxxxx> > Signed-off-by: Michael Kerrisk <mtk.manpages@xxxxxxxxx> > --- > > Hello Eric and others, > > After some quite helpful info from Chrstian Brauner, I've expanded > the discussion of locked mounts (a concept I didn't really have a > good grasp on) in the mount_namespaces(7) manual page. I would be > grateful to receive review comments, acks, etc., on the patch below. > Could you take a look please? > > Cheers, > > Michael > > man7/mount_namespaces.7 | 73 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/man7/mount_namespaces.7 b/man7/mount_namespaces.7 > index e3468bdb7..97427c9ea 100644 > --- a/man7/mount_namespaces.7 > +++ b/man7/mount_namespaces.7 > @@ -107,6 +107,62 @@ operation brings across all of the mounts from the original > mount namespace as a single unit, > and recursive mounts that propagate between > mount namespaces propagate as a single unit.) > +.IP > +In this context, "may not be separated" means that the mounts > +are locked so that they may not be individually unmounted. > +Consider the following example: > +.IP > +.RS > +.in +4n > +.EX > +$ \fBsudo mkdir /mnt/dir\fP > +$ \fBsudo sh \-c \(aqecho "aaaaaa" > /mnt/dir/a\(aq\fP > +$ \fBsudo mount \-\-bind -o ro /some/path /mnt/dir\fP > +$ \fBls /mnt/dir\fP # Former contents of directory are invisible Do we want a more motivating example such as a /proc/sys? It has been common to mount over /proc files and directories that can be written to by the global root so that users in a mount namespace may not touch them. > +.EE > +.in > +.RE > +.IP > +The above steps, performed in a more privileged user namespace, > +have created a (read-only) bind mount that > +obscures the contents of the directory > +.IR /mnt/dir . > +For security reasons, it should not be possible to unmount > +that mount in a less privileged user namespace, > +since that would reveal the contents of the directory > +.IR /mnt/dir . > +.IP > +Suppose we now create a new mount namespace > +owned by a (new) subordinate user namespace. > +The new mount namespace will inherit copies of all of the mounts > +from the previous mount namespace. > +However, those mounts will be locked because the new mount namespace > +is owned by a less privileged user namespace. > +Consequently, an attempt to unmount the mount fails: > +.IP > +.RS > +.in +4n > +.EX > +$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP > + \fBstrace \-o /tmp/log \e\fP > + \fBumount /mnt/dir\fP > +umount: /mnt/dir: not mounted. > +$ \fBgrep \(aq^umount\(aq /tmp/log\fP > +umount2("/mnt/dir", 0) = \-1 EINVAL (Invalid argument) > +.EE > +.in > +.RE > +.IP > +The error message from > +.BR mount (8) > +is a little confusing, but the > +.BR strace (1) > +output reveals that the underlying > +.BR umount2 (2) > +system call failed with the error > +.BR EINVAL , > +which is the error that the kernel returns to indicate that > +the mount is locked. Do you want to mention that you can unmount the entire subtree? Either with pivot_root if it is locked to "/" or with "umount -l /path/to/propagated/directory". > .IP * > The > .BR mount (2) > @@ -128,6 +184,23 @@ settings become locked > when propagated from a more privileged to > a less privileged mount namespace, > and may not be changed in the less privileged mount namespace. > +.IP > +This point can be illustrated by a continuation of the previous example. > +In that example, the bind mount was marked as read-only. > +For security reasons, > +it should not be possible to make the mount writable in > +a less privileged namespace, and indeed the kernel prevents this, > +as illustrated by the following: > +.IP > +.RS > +.in +4n > +.EX > +$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP > + \fBmount \-o remount,rw /mnt/dir\fP > +mount: /mnt/dir: permission denied. > +.EE > +.in > +.RE > .IP * > .\" (As of 3.18-rc1 (in Al Viro's 2014-08-30 vfs.git#for-next tree)) > A file or directory that is a mount point in one namespace that is not Eric