Re: For review: user_namespace(7) man page

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> · Tue, 09 Sep 2014 06:59:35 -0700

Hi Eric,

On 08/30/2014 02:53 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
> 
>> Hello Eric et al.,
>>
>> For various reasons, my work on the namespaces man pages 
>> fell off the table a while back. Nevertheless, the pages have
>> been close to completion for a while now, and I recently restarted,
>> in an effort to finish them. As you also noted to me f2f, there have
>> been recently been some small namespace changes that you may affect
>> the content of the pages. Therefore, I'll take the opportunity to
>> send the namespace-related pages out for further (final?) review.
>>
>> So, here, I start with the user_namespaces(7) page, which is shown 
>> in rendered form below, with source attached to this mail. I'll
>> send various other pages in follow-on mails.
>>
>> Review comments/suggestions for improvements / bug fixes welcome.
>>
>> Cheers,
>>
>> Michael
>>
>> ==
>>
>> NAME
>>        user_namespaces - overview of Linux user_namespaces
>>
[...]

>>        When a new IPC, mount, network, PID, or UTS namespace is  created
>>        via clone(2) or unshare(2), the kernel records the user namespace
>>        of the creating process against the new namespace.  (This associ‐
>>        ation  can't  be  changed.)   When a process in the new namespace
>>        subsequently  performs  privileged  operations  that  operate  on
>>        global resources isolated by the namespace, the permission checks
>>        are performed according to the process's capabilities in the user
>>        namespace that the kernel associated with the new namespace.
> 
> Restrictions on mount namespaces.
> 
> - A mount namespace has a owner user namespace.  A mount namespace whose
>   owner user namespace is different than the owerner user namespace of
>   it's parent mount namespace is considered a less privileged mount
>   namespace.
> 
> - When creating a less privileged mount namespace shared mounts are
>   reduced to slave mounts.  This ensures that mappings performed in less
>   privileged mount namespaces will not propogate to more privielged
>   mount namespaces.
> 
> - Mounts that come as a single unit from more privileged mount are
>   locked together and may not be separated in a less privielged mount
>   namespace.

Could you clarify what you mean by "Mounts that come as a single unit"?

> - The mount flags readonly, nodev, nosuid, noexec, and the mount atime
>   settings when propogated from a more privielged to a less privileged
>   mount namespace become locked, and may not be changed in the less
>   privielged mount namespace.
> 
> - (As of 3.18-rc1 (in todays Al Viros vfs.git#for-next tree)) A file or
>   directory that is a mountpoint in one namespace that is not a mount
>   point in another namespace, may be renamed, unlinked, or rmdired in
>   the mount namespace in which it is not a mount namespace if the
>   ordinary permission checks pass.
> 
>   Previously attemping to rmdir, unlink or rename a file or directory
>   that was a mount point in another mount namespace would result in
>   -EBUSY.  This behavior had technical problems of enforcement (nfs)
>   and resulted in a nice denial of servial attack against more
>   privileged users.  (Aka preventing individual files from being updated
>   by bind mounting on top of them).

I have reworked the text above a little so that now we have the following.
Aside from question above, does it look okay?

   Restrictions on mount namespaces
       Note the following points with respect to mount namespaces:

       *  A  mount  namespace  has  na  owner user namespace.  A mount
          namespace whose owner user namespace is different  from  the
          owner  user  namespace of its parent mount namespace is con‐
          sidered a less privileged mount namespace.

       *  When creating a  less  privileged  mount  namespace,  shared
          mounts  are reduced to slave mounts.  This ensures that map‐
          pings performed in less privileged mount namespaces will not
          propagate to more privileged mount namespaces.

       *  Mounts that come as a single unit from more privileged mount
          are locked together and may not be separated in a less priv‐
          ileged mount namespace.

       *  The  mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the
          "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME)  set‐
          tings  become  locked when propagated from a more privileged
          to a less privileged mount namespace, and may not be changed
          in the less privileged mount namespace.

       *  A  file  or directory that is a mount point in one namespace
          that is not a mount  point  in  another  namespace,  may  be
          renamed, unlinked, or removed (rmdir(2)) in the mount names‐
          pace in which it is not a mount point (subject to the  usual
          permission checks).

          Previously,  attempting  to unlink, rename, or remove a file
          or directory that was a mount point in another mount  names‐
          pace  would  result  in  the error EBUSY.  That behavior had
          technical problems of enforcement (e.g., for NFS)  and  per‐
          mitted  denial-of-service  attacks  against  more privileged
          users.   (i.e.,  preventing  individual  files  from   being
          updated by bind mounting on top of them).

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html