Am 21.08.2014 06:53, schrieb Eric W. Biederman: > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. > > Could you please look at my user-namespace.git#for-next branch I have a > fix for at least one regresion causing issue in there. I think it may > fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. >> /* >> * We can't immediately set the MS_RDONLY flag when mounting filesystems >> * because (in at least some kernel versions) this will propagate back >> * to the original mount in the host OS, turning it readonly too. Thus >> * we mount the filesystem in read-write mode initially, and then do a >> * separate read-only bind mount on top of that. >> */ >> bindOverReadonly = !!(mnt_mflags & MS_RDONLY); >> >> VIR_DEBUG("Mount %s on %s type=%s flags=%x", >> mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); >> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & >> ~MS_RDONLY, NULL) < 0) { >> >> ^^^^ Here it fails for sysfs because with user namespaces we bind the >> existing /sys into the container >> and would have to read out all existing mount flags from the current /sys mount. >> Otherwise mount() fails with EPERM. >> On my test system /sys is mounted with >> "rw,nosuid,nodev,noexec,relatime" and libvirt >> misses the realtime... > > Not specifying any atime flags to mount should be safe as that asks for > the default atime flags which for remount I have made the default atime > flags the existing atime flags. So I am scratching my head a little on > this one. Okay, let me find out why exactly libvirt gets a EPERM here. Maybe there are more odds hidden. >> >> virReportSystemError(errno, >> _("Failed to mount %s on %s type %s flags=%x"), >> mnt_src, mnt->dst, NULLSTR(mnt->type), >> mnt_mflags & ~MS_RDONLY); >> goto cleanup; >> } >> >> if (bindOverReadonly && >> mount(mnt_src, mnt->dst, NULL, >> MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { >> >> ^^^ Here it fails because now we'd have to specify all flags as used >> for the first >> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. >> See lxcBasicMounts[]. >> In this case the fix is easy, add mnt_mflags to the mount flags. > > That has always been a bug in general because remount has always > required specifying the complete set of mount flags you want to have. > > That fact that flags such as nosuid are now properly locked so you can > not change them if you are not the global root user just makes this > obvious. > > Andy Lutermorski has observed that statvfs will return the mount flags > making reading them simple. Thanks for the clarification, I'll create a fix for libvirt. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html