On Tue, Jan 9, 2018 at 2:28 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@xxxxxxxxxx): >> On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@xxxxxxxxxx): >> >> On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: >> >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@xxxxxxxxxx): >> >> >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: >> >> >> > Quoting James Morris (james.l.morris@xxxxxxxxxx): >> >> >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> >> >> >> unnecessary jargon from an end user point of view. >> >> >> > >> >> >> > Ah, yes, that was my point in >> >> >> > >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html >> >> >> > and >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html >> >> >> > >> >> >> >> This may happen internally but don't make it a special case with a >> >> >> >> different name and don't bother users with internal concepts: simply >> >> >> >> implement capability whitelists with the default having equivalent >> >> > >> >> > So the challenge is to have unprivileged users be contained, while >> >> > allowing trusted workloads in containers created by a root user to >> >> > bypass the restriction. >> >> > >> >> > Now, the current proposal actually doesn't support a root user starting >> >> > an application that it doesn't quite trust in such a way that it *is* >> >> > subject to the whitelist. >> >> >> >> Well, this is not hard since root process can spawn another process >> >> and loose privileges before creating user-ns to be controlled by the >> >> whitelist. >> > >> > It would have to drop cap_sys_admin for the container to be marked as >> > "controlled", which may prevent the container runtime from properly starting >> > the container. >> > >> Yes, but that's a conflict of trusted operations (that requires >> SYS_ADMIN) and untrusted processes it may spawn. > > Not sure I understand what you're saying, but > > I guess that in any case the task which is doing unshare(CLONE_NEWNS) > can drop cap_sys_admin first. Though that is harder if using clone, > and it is awkward because it's not the container manager, but the user, > who will judge whether the container workload should be restricted. > So the container driver will add a flag like "run-controlled", and > the driver will convert that to dropping a capability; which again > is weird. It would seem nicer to introduce a userns flag, 'caps-controlled' > For an unprivileged userns, it is always set to 1, and root cannot > change it. For a root-created userns, it stays 0, but root can set it > to 1 (using /proc file?). In this way a either container runtime or just an > admin script can say "no wait I want this container to still be controlled". > > Or we could instead add a second sysctl to decide whether all or only > 'controlled' user namespaces should be controlled. That's not pretty though. > Yes, I like your idea of a flag to clone() which will force the user-ns to be controlled. This will have effect only on the root user and any other user specifying is actually a NOP since those will be controlled with or without that flag. But this is still an enhancement to the current patch-set and I don't mind doing it as a follow-up after this patch-series. At this moment James has asked for Eric's input, which I believe hasn't been recorded. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html