On Fri, Nov 30, 2018 at 2:09 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > On Mon, Nov 26, 2018 at 11:29:40PM -0600, Eric W. Biederman wrote: > > Luis Chamberlain <mcgrof@xxxxxxxxxx> writes: > > > Thanks for the description of how to run into the issue described but > > > is there also a practical use case today where this is happening? I ask > > > as it would be good to know the severity of the issue in the real world > > > today. > > > > People trying to run containers without a root user in the container. > > It atypical but something doable. > > My question was if there are generic tools / propreitary tools which are > doing this widely *today*. Or is this just a custom setup some folks > use? We will soon start using this setup at Google to harden our usage of CRIU. There are some more details in my LPC presentation: https://linuxplumbersconf.org/event/2/contributions/210/ Although I don't know of specific tools using this setup, there was a kernel patch in 2017 to support such use case: 7c6d78148fa0 - prctl: Allow local CAP_SYS_ADMIN changing exe_file So, perhaps Virtuozzo people use a similar setup too? > > We spoke about this at LPC. And this is the correct behavioral change. > > > > The problem is there is a default value for i_uid and i_gid that is > > correct in the general case. That default value is not corect for > > sysctl, because proc is weird. As the sysctl permission check in > > test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not > > notice that i_uid and i_gid were being set wrong. > > > > So all this patch does is fix the default values i_uid and i_gid. > > > > The commit comment seems worth cleaning up. But for the > > content of the code. > > The logic seems sensible then, but are we implicating what a container > does with its sysctl values onto the entire system? If so, sure, it > seems you want this for networking purposes as there are a series of > sysctl values a container may want to muck with, but are we sure we > want the same for *all* sysctl entries? The point is that these sysctls do not affect the whole system, just an appropriate namespace. For example, IPC-related files (e.g. shmmax) will always affect writing process's UTS namespace, regardless of /proc mountpoint that is used to access them: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/ipc_sysctl.c?h=v4.20-rc4#n24 I presume the net-related sysctls that Eric was referring to have a similar behavior. > > Luis
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature