Re: [CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[CC'ing libvirt-lxc folks]

Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
> Richard Weinberger <richard@xxxxxx> writes:
> 
>> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
>>>> FWIW, it breaks also libvirt-lxc:
>>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
>>>
>>> Interesting.  I had not anticipated a failure there?  And it is failing
>>> in remount?  Oh that is interesting.
>>>
>>> That implies that there is some flag of the original mount of /proc that
>>> the remount of /proc/sys is clearing, and that previously 
>>>
>>> The flags specified are current rdonly,remount,bind so I expect there
>>> are some other flags on proc that libvirt-lxc is clearing by accident
>>> and we did not fail before because the kernel was not enforcing things.
>>
>> Please see:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933
>> lxcContainerMountBasicFS()
>>
>> and:
>> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850
>> lxcBasicMounts
>>
>>> What are the mount flags in a working libvirt-lxc?
>>
>> See:
>> test1:~ # cat /proc/self/mountinfo
>> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
> 
>> If you need more info, please let me know. :-)
> 
> Oh interesting I had not realized libvirt-lxc had grown an unprivileged
> mode using user namespaces.
> 
> This does appear to be a classic remount bug, where you are not
> preserving the permissions.  It appears the fact that the code
> failed to enforce locked permissions on the fresh mount of proc
> was hiding this bug until now.
> 
> I expect what you actually want is the code below:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..f008a7484bfe 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
>      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
>      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
>      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> 
> Or possibly just:
> 
> diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
> index 9a9ae5c2aaf0..a60ccbd12bfc 100644
> --- a/src/lxc/lxc_container.c
> +++ b/src/lxc/lxc_container.c
> @@ -850,7 +850,7 @@ typedef struct {
>  
>  static const virLXCBasicMountInfo lxcBasicMounts[] = {
>      { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
> -    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
> +    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false },
>      { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
>      { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
>      { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
> 
> As the there is little point in making /proc/sys read-only in a
> user-namespace, as the permission checks are uid based and no-one should
> have the global uid 0 in your container.  Making mounting /proc/sys
> read-only rather pointless.

Eric, using the patch below I was able to spawn a user-namespace enabled container
using libvirt-lxc. :-)

I had to:
1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case.
2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case.
3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though).

Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured.
With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace
is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no
point in mounting these read-only. Except for the case then no user-namespace is used.

diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c..497e05f 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,10 +850,10 @@ typedef struct {

 static const virLXCBasicMountInfo lxcBasicMounts[] = {
     { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
-    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false },
-    { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true },
-    { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true },
-    { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
+    { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false },
+    { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, true, false, true },
+    { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, true, false, true },
+    { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
     { "securityfs", "/sys/kernel/security", "securityfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false },
 #if WITH_SELINUX
     { SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false },

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux