RE: Could not mount sysfs when enable userns but disable netns

"chenhanxiao@xxxxxxxxxxxxxx" <chenhanxiao@xxxxxxxxxxxxxx> · Mon, 14 Jul 2014 09:32:39 +0000

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm@xxxxxxxxxxxx]
> Sent: Saturday, July 12, 2014 12:29 AM
> To: Serge E. Hallyn
> Cc: Chen, Hanxiao/陈 晗霄; Serge Hallyn (serge.hallyn@xxxxxxxxxx); Greg
> Kroah-Hartman; containers@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: Could not mount sysfs when enable userns but disable netns
> 
> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
> 
> > Quoting chenhanxiao@xxxxxxxxxxxxxx (chenhanxiao@xxxxxxxxxxxxxx):
> >> Hello,
> >>
> >> How to reproduce:
> >> 1. Prepare a container, enable userns and disable netns
> >> 2. use libvirt-lxc to start a container
> >> 3. libvirt could not mount sysfs then failed to start.
> >>
> >> Then I found that
> >> commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e says:
> >> "Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
> >> over the net namespace."
> >>
> >> But why should we check sysfs mouont permission over net namespace?
> >> We've already checked CAP_SYS_ADMIN though.
> 
> We already checked capable(CAP_SYS_ADMIN) and it failed.

But on my machine, capable(CAP_SYS_ADMIN) passed
but failed in kobj_ns_current_may_mount.

I added some printks in sysfs_mount:
        if (!(flags & MS_KERNMOUNT)) {
-               if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type))
+               if (!capable(CAP_SYS_ADMIN) && !fs_fully_visible(fs_type)) {
+                       printk(KERN_WARNING "Failed in capable\n");
                        return ERR_PTR(-EPERM);
+                }

-               if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET))
+               if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) {
+                       printk(KERN_WARNING "Failed in kobj_ns_current_may_mount\n");
                        return ERR_PTR(-EPERM);
+                }

And found: 
Jul 14 09:55:26 localhost systemd: Starting Container lxc-chx.
Jul 14 09:55:26 localhost systemd-machined: New machine lxc-chx.
Jul 14 09:55:26 localhost systemd: Started Container lxc-chx.
Jul 14 09:55:26 localhost kernel: [  784.044709] Failed in kobj_ns_current_may_mount
Jul 14 09:55:26 localhost systemd-machined: Machine lxc-chx terminated.

> 
> >> What the relationship between sysfs and net namespace,
> >> or this check is a little redundant?
> 
> You want a bind mount not a new fresh mount.
> 

Yes, we need to modify libvirt's codes to deal with sysfs
when enable userns but disable netns.

Thanks,
- Chen

> When looking at how evil actors could abuse things it turned out that in
> some circumstances the root user (before a user namespace is created)
> needs to control the policy on which filesystems may be mounted.  There
> are files in sysfs and in proc that you never want to see in a chroot
> jail, as they just create more surface area to attack.
> 
> The only reason for creating a new fresh mount of sysfs is to get access
> to /sys/class/net.  So to keep things simple we restrict creation of
> that mount to cases where the mounter has permisions over the network
> namespace, and cases where nothing interesing is mounted on top of
> sysfs.
> 
> If a new /sys/class/net is not needed it is possible to bind mount the
> existing copy of sysfs to the new location without loss of
> functionality.
> 
> > It is not redundant.  The whole point is that after clone(CLONE_NEWUSER)
> > you get a newly filled set of capabilities.  But you should not have
> > privileges over the host's network namesapce.  After you unshare a new
> > network namespace, you *should* have privilege over it.  So the fact
> > that we've already check CAP_SYS_ADMIN means nothing, because the
> > capabilities need to be targeted.
> 
> Exactly the tests are failing because the caller is not the global root
> and so the code is properly failing the permission checks.
> 
> Eric
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers