Luis Chamberlain <mcgrof@xxxxxxxxxx> writes: > On Mon, Nov 26, 2018 at 06:26:07PM +0100, Radoslaw Burny wrote: >> Due to a recent commit (d151ddc00498 - fs: Update i_[ug]id_(read|write) >> to translate relative to s_user_ns), > > Recent? This is commit is from 2014 and present upstream since v4.8. > And the commit ID you mentioned in your commit log seems to be > incorrect. I get: > > 81754357770ebd900801231e7bc8d151ddc00498a fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns > >> inodes under /proc/sys have -1 >> written to their i_uid/i_gid members if a containing userns does not >> have entries for root in the uid/gid_map. > > Thanks for the description of how to run into the issue described but > is there also a practical use case today where this is happening? I ask > as it would be good to know the severity of the issue in the real world > today. People trying to run containers without a root user in the container. It atypical but something doable. >> This wouldn't normally matter, because these values are not used for >> access checks. However, a later change (0bd23d09b874 - Don't modify >> inodes with a uid or gid unknown to the vfs) changes the kernel to >> prevent opens for write if the i_uid/i_gid field in the inode is -1, >> even if the /proc/sys-specific access checks would otherwise pass. >> >> This causes a problem: in a userns without root mapping, even the >> namespace creator cannot write to e.g. /proc/sys/kernel/shmmax. >> This change fixes the problem by overriding i_uid/i_gid back to >> GLOBAL_ROOT_UID/GID. > > We really need Seth and Eric to provide guidance here as they were > the ones devising this long ago, but to me your solution seems backward. > Why allow any namespace to muck with /proc/sys/ seettings? There are many per namespace sysctls. Most of them are in the networking stack. > Let's recall that this case was a corner case, and writeback was the > biggest concern, and for that it was decided that you'd simply not get > write access, and so its read only. Its not clear to me if things like > proc were considered. For the regular file case the situation can be > addressed with chown, however we can't chown proc files. > >> Tested: Used a repro program that creates a user namespace without any >> mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside. >> Before the change, it shows uid/gid of 65534, > > I thought you said it would be uid/gid -1 without your patch? It is INVALID_UID/INVALID_GID. It is an over simplifcation to call them -1. As they are not a valid value and are never mapped in any user namespace they are displayed as the overflow_uid or overflow_gid which is 65534 by default. >> with the change it's 0. > > Note that a good way to also test issues is with the lib/test_sysctl.c > module and the tools/testing/selftests/sysctl/sysctl.sh script, so if > you can device a test there, once we decide what to do that would be > appreciated. We spoke about this at LPC. And this is the correct behavioral change. The problem is there is a default value for i_uid and i_gid that is correct in the general case. That default value is not corect for sysctl, because proc is weird. As the sysctl permission check in test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not notice that i_uid and i_gid were being set wrong. So all this patch does is fix the default values i_uid and i_gid. The commit comment seems worth cleaning up. But for the content of the code. I expect when I have a few moments I will pick this change up. Reviewed-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Eric >> Signed-off-by: Radoslaw Burny <rburny@xxxxxxxxxx> >> --- >> fs/proc/proc_sysctl.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c >> index c5cbbdff3c3d..67379a389658 100644 >> --- a/fs/proc/proc_sysctl.c >> +++ b/fs/proc/proc_sysctl.c >> @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, >> >> if (root->set_ownership) >> root->set_ownership(head, table, &inode->i_uid, &inode->i_gid); >> + else { >> + inode->i_uid = GLOBAL_ROOT_UID; >> + inode->i_gid = GLOBAL_ROOT_GID; >> + } >> >> out: >> return inode; >> -- >> 2.20.0.rc0.387.gc7a69e6b6c-goog >>