Re: [PATCH] fs: Make /proc/sys inodes be owned by global root.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Radoslaw Burny <rburny@xxxxxxxxxx> writes:

> On Tue, Nov 27, 2018 at 6:29 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
>  Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:
>
>  > On Mon, Nov 26, 2018 at 06:26:07PM +0100, Radoslaw Burny wrote:
>  >> Due to a recent commit (d151ddc00498 - fs: Update i_[ug]id_(read|write)
>  >> to translate relative to s_user_ns),
>  >
>  > Recent? This is commit is from 2014 and present upstream since v4.8.
>  > And the commit ID you mentioned in your commit log seems to be
>  > incorrect. I get:
>  >
>  > 81754357770ebd900801231e7bc8d151ddc00498a fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
>  >
>  >> inodes under /proc/sys have -1
>  >> written to their i_uid/i_gid members if a containing userns does not
>  >> have entries for root in the uid/gid_map.
>  >
>  > Thanks for the description of how to run into the issue described but
>  > is there also a practical use case today where this is happening? I ask
>  > as it would be good to know the severity of the issue in the real world
>  > today.
>
>  People trying to run containers without a root user in the container.
>  It atypical but something doable. 
>
>  >> This wouldn't normally matter, because these values are not used for
>  >> access checks. However, a later change (0bd23d09b874 - Don't modify
>  >> inodes with a uid or gid unknown to the vfs) changes the kernel to
>  >> prevent opens for write if the i_uid/i_gid field in the inode is -1,
>  >> even if the /proc/sys-specific access checks would otherwise pass.
>  >> 
>  >> This causes a problem: in a userns without root mapping, even the
>  >> namespace creator cannot write to e.g. /proc/sys/kernel/shmmax.
>  >> This change fixes the problem by overriding i_uid/i_gid back to
>  >> GLOBAL_ROOT_UID/GID.
>  >
>  > We really need Seth and Eric to provide guidance here as they were
>  > the ones devising this long ago, but to me your solution seems backward.
>  > Why allow any namespace to muck with /proc/sys/ seettings?
>
>  There are many per namespace sysctls. Most of them are in the
>  networking stack.
>
>  > Let's recall that this case was a corner case, and writeback was the
>  > biggest concern, and for that it was decided that you'd simply not get
>  > write access, and so its read only. Its not clear to me if things like
>  > proc were considered. For the regular file case the situation can be
>  > addressed with chown, however we can't chown proc files.
>  >
>  >> Tested: Used a repro program that creates a user namespace without any
>  >> mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
>  >> Before the change, it shows uid/gid of 65534,
>  >
>  > I thought you said it would be uid/gid -1 without your patch?
>
>  It is INVALID_UID/INVALID_GID. It is an over simplifcation to call
>  them -1. As they are not a valid value and are never mapped in any
>  user namespace they are displayed as the overflow_uid or overflow_gid
>  which is 65534 by default.
>
>  >> with the change it's 0.
>  >
>  > Note that a good way to also test issues is with the lib/test_sysctl.c
>  > module and the tools/testing/selftests/sysctl/sysctl.sh script, so if
>  > you can device a test there, once we decide what to do that would be
>  > appreciated.
>
>  We spoke about this at LPC. And this is the correct behavioral change.
>
>  The problem is there is a default value for i_uid and i_gid that is
>  correct in the general case. That default value is not corect for
>  sysctl, because proc is weird. As the sysctl permission check in
>  test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not
>  notice that i_uid and i_gid were being set wrong.
>
>  So all this patch does is fix the default values i_uid and i_gid.
>
>  The commit comment seems worth cleaning up. But for the
>  content of the code.
>
>  I expect when I have a few moments I will pick this change up.
>
>  Reviewed-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>
>  Eric
>
> Thanks, Eric. Should I send a v2 patch with an updated description,
> or can you just modify the description when applying this one?

I am absolutely swampped and moving at the moment.  Can you please
send a v2 with an updated description.

Thank you,
Eric

>
>  >> Signed-off-by: Radoslaw Burny <rburny@xxxxxxxxxx>
>  >> ---
>  >> fs/proc/proc_sysctl.c | 4 ++++
>  >> 1 file changed, 4 insertions(+)
>  >> 
>  >> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
>  >> index c5cbbdff3c3d..67379a389658 100644
>  >> --- a/fs/proc/proc_sysctl.c
>  >> +++ b/fs/proc/proc_sysctl.c
>  >> @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
>  >> 
>  >> if (root->set_ownership)
>  >> root->set_ownership(head, table, &inode->i_uid, &inode->i_gid);
>  >> + else {
>  >> + inode->i_uid = GLOBAL_ROOT_UID;
>  >> + inode->i_gid = GLOBAL_ROOT_GID;
>  >> + }
>  >> 
>  >> out:
>  >> return inode;
>  >> -- 
>  >> 2.20.0.rc0.387.gc7a69e6b6c-goog
>  >> 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux