On Thu, May 28, 2015 at 10:33 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Thu, May 28, 2015 at 8:03 AM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> Serge Hallyn <serge.hallyn@xxxxxxxxxx> writes: >> >>> Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx): >>>> On Fri, May 22, 2015 at 10:39 AM, Eric W. Biederman >>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>> > I had hoped to get some Tested-By's on that patch series. >>>> >>>> Sorry, I've been totally swamped. >>>> >>>> I suspect that Sandstorm is okay, but I haven't had a chance to test >>>> it for real. Sandstorm makes only limited use of proc and sysfs in >>>> containers, but I'll see if I can test it for real this weekend. >>> >>> Testing this with unprivileged containers, I get >>> >>> lxc-start: conf.c: lxc_mount_auto_mounts: 808 Operation not permitted >>> - error mounting sysfs on >>> /usr/lib/x86_64-linux-gnu/lxc/sys/devices/virtual/net flags 0 >> >> Grr.. I was afraid this would break something. :( >> >> Looking at my system I see that sysfs is currently mounted >> "nosuid,nodev,noexec" >> >> Looking at the lxc-start code I don't see it as including any of those >> mount options. In practice for sysfs I think those options are >> meaningless (as there should be no devices and nothing executable in >> sysfs) but I can understand the past concerns with chmod on virtual >> filesystems that would incline people to use them, so I think the >> failure is reporting a legitimate security issue in the lxc userspace >> code where the the unprivileged code is currently attempting to give >> greater access to sysfs than was given by the original mount of sysfs. >> >> As nosuid,nodev,noexec should not impair the operation of sysfs >> operation it looks like you can always specify those options and just >> make this concern go away. > > Linus is pretty strict about not breaking the ABI, and this definitely > counts as breaking the ABI. There's an exception for security issues, > but is there really a security issue here? That is, do we lose > anything important if we just drop the offending part of the patch > set? As you've said, there shouldn't be sensitive device nodes, > executables, or setuid files in proc or sysfs in the first place. Speaking as a user of the mount() interfaces, I really think it would be less confusing overall if mount() simply ignored the requested flags when the caller doesn't have a choice. That is, in cases where mount() currently fails with EPERM when not given, say, MS_NOSUID, it should instead just pretend the caller actually set MS_NOSUID and go ahead with a nosuid mount. Or put another way, the absence of MS_NOSUID should not be interpreted as "remove the nosuid bit" but rather "don't set the nosuid bit if not required". Consider: - This approach will actually cause lxc to have the correct behavior, without any changes to lxc. I suspect that this generalizes: In the vast majority of cases, when users have failed to set MS_NOSUID, it's not because they are explicitly requesting that the flag be turned off, but rather that they didn't know it mattered. - If a user actually *does* expect not passing MS_NOSUID to remove the nosuid bit, and they find instead that the nosuid bit is silently kept, I don't think they'll be confused: it's pretty obvious in context that this must be for security reasons. - On the other hand, the current behavior *is* very confusing: mount() returns EPERM because of rules the caller probably doesn't know anything about. I've spent a fair amount of time frustrated by this sort of thing. -Kenton -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html