> -----Original Message----- > From: Richard Weinberger [mailto:richard.weinberger@xxxxxxxxx] > Sent: Wednesday, March 11, 2015 4:56 AM > To: Chen, Hanxiao/陈 晗霄 > Cc: libvir-list@xxxxxxxxxx; Daniel P. Berrange; Gao feng > Subject: Re: [PATCH] LXC: create a bind mount for sysfs when enable userns > but disable netns > > On Mon, Jul 14, 2014 at 12:01 PM, Chen Hanxiao > <chenhanxiao@xxxxxxxxxxxxxx> wrote: > > kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e > > forbid us doing a fresh mount for sysfs > > when enable userns but disable netns. > > This patch will create a bind mount in this senario. > > Sorry for exhuming an already merged patch but today I ran into a > nasty issue caused by it. > > > Signed-off-by: Chen Hanxiao <chenhanxiao@xxxxxxxxxxxxxx> > > --- > > src/lxc/lxc_container.c | 44 +++++++++++++++++++++++++++++++++----------- > > 1 file changed, 33 insertions(+), 11 deletions(-) > > > > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c > > index 4d89677..8a27215 100644 > > --- a/src/lxc/lxc_container.c > > +++ b/src/lxc/lxc_container.c > > @@ -815,10 +815,13 @@ static int lxcContainerSetReadOnly(void) > > } > > > > > > -static int lxcContainerMountBasicFS(bool userns_enabled) > > +static int lxcContainerMountBasicFS(bool userns_enabled, > > + bool netns_disabled) > > { > > size_t i; > > int rc = -1; > > + char* mnt_src = NULL; > > + int mnt_mflags; > > > > VIR_DEBUG("Mounting basic filesystems"); > > > > @@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled) > > bool bindOverReadonly; > > virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i]; > > > > + /* When enable userns but disable netns, kernel will > > + * forbid us doing a new fresh mount for sysfs. > > + * So we had to do a bind mount for sysfs instead. > > + */ > > + if (userns_enabled && netns_disabled && > > + STREQ(mnt->src, "sysfs")) { > > + if (VIR_STRDUP(mnt_src, "/sys") < 0) { > > + goto cleanup; > > + } > > This is clearly broken and looks very untested to me. > It's broken now. But when I submitted this patch last year, it's not. > It will issue this mount call: > mount("/sys", "/sys", "sysfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND, NULL) > because the code runs after pivot_root(2). > i.e, /sys will be still empty after that and no sysfs at all there. > As libvirt will later remount /sys readonly creating a container will > fail with the most useless error message: > Error: internal error: guest failed to start: Unable to create > directory /sys/fs/: Read-only file system > or > Error: internal error: guest failed to start: Unable to create > directory /sys/fs/cgroup: Read-only file system > > Please note that changing "/sys" to "/.oldroot/sys" will not solve the > issue as this code runs already in the new > namespace and therefore the old mount tree is locked, thus MS_BIND is > not allowed. > > This brings me to the question, why do you handle the netns_disabled > case anyway? Please check the discussion at: http://lists.linux-foundation.org/pipermail/containers/2014-July/034721.html > If in the XML file no network is specified just create a new and empty > network namespace. > Bindmounting /sys into the container is a security issue. This is why > mounting sysfs without a netns > was disabled to begin with. Yes, I tried to propose enable netns by default, but Dan thought that we should allow containers sharing the host's network: http://www.redhat.com/archives/libvir-list/2013-August/msg01025.html So we should allow user create containers without netns, they should know what they do if they read libvirt's docs See docs patch describe security considerations: http://www.redhat.com/archives/libvir-list/2013-September/msg00562.html Regards, - Chen > > P.S: Sorry for the grumpy mail, I've wasted almost the whole day with > debugging that issue. > > -- > Thanks, > //richard -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list