On Wed, Nov 27, 2013 at 12:07 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: > >> Oleg Nesterov <oleg@xxxxxxxxxx> writes: >> >>> Just to avoid the possible confusion, let me repeat that the fix itsef >>> looks "obviously fine" to me, "i_nlink != 2" looks obviously wrong. >>> >>> I am not arguing with this patch, I am just trying to understand this >>> logic. >>> >>> On 11/27, Eric W. Biederman wrote: >>>> >>>> [... snip ...] >>> >>> Thanks a lot. >>> >>>> For the real concern about jail environments where proc and sysfs are >>>> not mounted at all a fs_visible check is all that is really required, >>> >>> this is what I can't understand... >>> >>> Lets ignore the implementation details. Suppose that proc was never >>> mounted. Then "mount -t proc" should fail after CLONE_NEWUSER | NEWNS? >> >> Yes. > > Well strictly speaking it should fail after CLONE_NEWUSER | NEWNS | NEWPID. > If proc was never mounted. > > Fresh mounts of proc are not allowed unless you have also created the > pid namespace. With just CLONE_NEWUSER | NEWNS you are limited to bind > mounts. > > Has this cleared up the confusion? > > Eric > This is all obnoxiously complicated. I wonder if we can do (a lot) better by allowing a "pid-only" variant of proc to be mounted. It should contain: - All the pid directories - /proc/self, /proc/net, and /proc/mounts (but possibly not /proc/PID/net -- that's a weird interface IMO and isn't really related to the pid) - keys key-users (wtf is up with that interface, though -- those files are way too magical) - cpuinfo, version, and maybe other informational things (crypto?) - loadavg, perhaps I wonder it would be possible to boot a reasonable container with a heavily limited /proc like that. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html