Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > On May 28, 2015 12:19 PM, "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> wrote: >> Kenton Varda <kenton@xxxxxxxxxxxx> writes: >> >> We do need to enforce retaining the existing mount flags one way or >> another. Where this really matters is with MS_RDONLY. We don't want >> any old user to be able to mount /proc read-write when root mounted it >> read-only. There is a very real attack vector there. That attack >> almost works in docker container today and is avoided simply because >> docker mounts over a few files on proc. > > You could drop the nosuid, noexec, and nodev changes and keep just the > ro part. The ro part is probably not an ABI break in the sense of > something that actually breaks real programs. As a change simply removing the code from the existing patches that worries about nosuid, noexec, and the nodev flags is certainly doable. It is the best proposal I have heard so far. I remain unconvinced about ignoring those flags: - There are clearly people who think it matters (or else proc and sysfs would not have those flags specified). - There have been times when it actually has mattered. Aka when files like /proc/self/env could be chmodded and used for privilege escalation. - The code in lxc and libvirt-lxc so far has been clearly buggy. * lxc only has problems with sysfs (in some configurations). * libvirt-lxc only has problems on a bind mount remount of proc after remounting proc properly. So I am leaning towards enforcing all of the mount flags including nosuid, noexec, and nodev. Then when the next subtle bug in proc or sysfs with respect to chmod shows up I will be able to sleep soundly at night because the mount flags of those filesystems allow a mitigation, and I did not sabatage the mitigation. Plus contemplating code that just enforces a couple of mount flags but not all of the feels wrong. I don't think it is actually a maintainable position to just enforce a couple of those flags. If nothing else I would expect someone to look at the code and to generate a bug fix to start enforcing the rest of the flags. Or perhaps it is in a few years time and something gets refactored and the enforcing starts happening by virtue of using a new common function that no-one realizes will be a problem. Additionally if we don't enforce nosuid, noexec, and nodev people are going to ask questions, that will be hard to explain. When what is truly desirable is to say that sysfs and proc are a little odd but they don't allow anything that a bind mount won't. I can be persuaded otherwise but right now I do think the kernel code needs to enforce nosuid, noexec, and nodev as it is a security issue (if only a defence in depth one), and a maintenance issue as I do not believe in the long term it is a maintanable or an explicable position. >> > Speaking as a user of the mount() interfaces, I really think it would >> > be less confusing overall if mount() simply ignored the requested >> > flags when the caller doesn't have a choice. That is, in cases where >> > mount() currently fails with EPERM when not given, say, MS_NOSUID, it >> > should instead just pretend the caller actually set MS_NOSUID and go >> > ahead with a nosuid mount. Or put another way, the absence of >> > MS_NOSUID should not be interpreted as "remove the nosuid bit" but >> > rather "don't set the nosuid bit if not required". >> >> I am conflicted. Implicits are nice but confusing. If we can do >> something reliable and robust and maintainable here that is truly worth >> the cost I am all for it. >> >> If I mount proc read-write I likely want to be able to write to proc >> files, and I will be much happier if the mount fails than if a bazillion >> syscalls later something else fails when it tries to write to proc. > > I agree. I don't like the implicit thing. My memory returns of our last round of looking at this and for whatever it's warts the existing mount API for remounting filesystems needs to have the flags have exactly the same meaning as at mount time. There are existing userspace applications that depend on that behavior. Implicits for only the locked mount flags is a little different but still ick. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers