Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx): > On Wed, Nov 27, 2013 at 12:07 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: > > ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: > > > >> Oleg Nesterov <oleg@xxxxxxxxxx> writes: > >> > >>> Just to avoid the possible confusion, let me repeat that the fix itsef > >>> looks "obviously fine" to me, "i_nlink != 2" looks obviously wrong. > >>> > >>> I am not arguing with this patch, I am just trying to understand this > >>> logic. > >>> > >>> On 11/27, Eric W. Biederman wrote: > >>>> > >>>> [... snip ...] > >>> > >>> Thanks a lot. > >>> > >>>> For the real concern about jail environments where proc and sysfs are > >>>> not mounted at all a fs_visible check is all that is really required, > >>> > >>> this is what I can't understand... > >>> > >>> Lets ignore the implementation details. Suppose that proc was never > >>> mounted. Then "mount -t proc" should fail after CLONE_NEWUSER | NEWNS? > >> > >> Yes. > > > > Well strictly speaking it should fail after CLONE_NEWUSER | NEWNS | NEWPID. > > If proc was never mounted. > > > > Fresh mounts of proc are not allowed unless you have also created the > > pid namespace. With just CLONE_NEWUSER | NEWNS you are limited to bind > > mounts. > > > > Has this cleared up the confusion? > > > > Eric > > > > This is all obnoxiously complicated. I wonder if we can do (a lot) > better by allowing a "pid-only" variant of proc to be mounted. It > should contain: > > - All the pid directories > - /proc/self, /proc/net, and /proc/mounts (but possibly not > /proc/PID/net -- that's a weird interface IMO and isn't really related > to the pid) > - keys key-users (wtf is up with that interface, though -- those > files are way too magical) > - cpuinfo, version, and maybe other informational things (crypto?) > - loadavg, perhaps > > I wonder it would be possible to boot a reasonable container with a > heavily limited /proc like that. Should be possible. And heck, maybe some of the values could then be virtualized :) cmdline could point to the container init's cmdline; cpuinfo and loadavg and meminfo be filtered through cgroupfs. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html