On Wed, Dec 01, 2021 at 02:29:09PM -0500, James Bottomley wrote: > On Wed, 2021-12-01 at 12:35 -0500, Stefan Berger wrote: > > On 12/1/21 11:58, James Bottomley wrote: > > > On Tue, 2021-11-30 at 11:06 -0500, Stefan Berger wrote: > > > > From: Denis Semakin <denis.semakin@xxxxxxxxxx> > > > > > > > > Use integrity_admin_ns_capable() to check corresponding > > > > capability to allow read/write IMA policy without CAP_SYS_ADMIN > > > > but with CAP_INTEGRITY_ADMIN. > > > > > > > > Signed-off-by: Denis Semakin <denis.semakin@xxxxxxxxxx> > > > > --- > > > > security/integrity/ima/ima_fs.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/security/integrity/ima/ima_fs.c > > > > b/security/integrity/ima/ima_fs.c > > > > index fd2798f2d224..6766bb8262f2 100644 > > > > --- a/security/integrity/ima/ima_fs.c > > > > +++ b/security/integrity/ima/ima_fs.c > > > > @@ -393,7 +393,7 @@ static int ima_open_policy(struct inode > > > > *inode, > > > > struct file *filp) > > > > #else > > > > if ((filp->f_flags & O_ACCMODE) != O_RDONLY) > > > > return -EACCES; > > > > - if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) > > > > + if (!integrity_admin_ns_capable(ns->user_ns)) > > > so this one is basically replacing what you did in RFC 16/20, which > > > seems a little redundant. > > > > > > The question I'd like to ask is: is there still a reason for > > > needing CAP_INTEGRITY_ADMIN? My thinking is that now IMA is pretty > > > much tied to requiring a user (and a mount, because of > > > securityfs_ns) namespace, there might not be a pressing need for an > > > admin capability separated from CAP_SYS_ADMIN because the owner of > > > the user namespace passes the ns_capable(..., CAP_SYS_ADMIN) > > > check. The rationale in > > > > Casey suggested using CAP_MAC_ADMIN, which I think would also work. > > > > CAP_MAC_ADMIN (since Linux 2.6.25) > > Allow MAC configuration or state changes. Implemented > > for > > the Smack Linux Security Module (LSM). > > > > > > Down the road I think we should cover setting file extended > > attributes with the same capability as well for when a user signs > > files or installs packages with file signatures. A container runtime > > could hold CAP_SYS_ADMIN while setting up a container and mounting > > filesystems and drop it for the first process started there. Since we > > are using the user namespace to spawn an IMA namespace, we would then > > require CAP_SYSTEM_ADMIN to be left available so that the user can do > > IMA related stuff in the container (set or append to the policy, > > write file signatures). I am not sure whether that should be the case > > or rather give the user something finer grained, such as > > CAP_MAC_ADMIN. So, it's about granularity... > > It's possible ... any orchestration system that doesn't enter a user > namespace has to strictly regulate capabilities. I'm probably biased > because I always use a user_ns so I never really had to mess with > capabilities. > > > > https://kernsec.org/wiki/index.php/IMA_Namespacing_design_considerations > > > > > > Is effectively "because CAP_SYS_ADMIN is too powerful" but that's > > > no longer true of the user namespace owner. It only passes the > > > ns_capable() check not the capable() one, so while it does get > > > CAP_SYS_ADMIN, it can only use it in a few situations which > > > represent quite a power reduction already. > > > > At least docker containers drop CAP_SYS_ADMIN. > > Well docker doesn't use the user_ns. But even given that, > CAP_SYS_ADMIN is always dropped for most container systems. What > happens when you enter a user namespace is the ns_capable( ..., > CAP_SYS_ADMIN) check returns true if you're the owner of the user_ns, > in the same way it would for root. So effectively entering a user > namespace without CAP_SYS_ADMIN but mapping the owner id to 0 (what > unshare -r --user does) gives you back a form of CAP_SYS_ADMIN that > responds only in the places in the kernel that have a ns_capable() > check instead of a capable() one (most of the places you list below). > This is the principle of how unprivileged containers actually work ... > and the source of some of our security problems if you get back an > ability to do something you shouldn't be allowed to do as an > unprivileged user. > > > I am not sure what the decision was based on but probably they don't > > want to give the user what is not absolutely necessary, but usage of > > user namespaces (with IMA namespaces) would kind of force it to be > > available then to do IMA-related stuff ... > > > > Following this man page here > > https://man7.org/linux/man-pages/man7/user_namespaces.7.html > > > > CAP_SYS_ADMIN in a user namespace is about > > > > - bind-mounting filesystems > > > > - mounting /proc filesystems > > > > - creating nested user namespaces > > > > - configuring UTS namespace > > > > - configuring whether setgroups() can be used > > > > - usage of setns() > > > > > > Do we want to add '- only way of *setting up* IMA related stuff' to > > this list? > > I don't see why not, but other container people should weigh in > because, as I said, I mostly use the user namespace and unprivileged > containers and don't bother with capabilities. There are very few scenarios where dropping capabilities in an unprivileged container makes sense. In a lot of other scenarios it is just a misunderstanding of the meaning of capabilities and their relationship to user namespaces. Usually, granting a full set of capabilities to the payload of an unprivigileged container is the right thing to do. All things that are properly namespaced will check capabilities in the relevant user namespace. Those that aren't will check them against the initial user namespaces. But I do think the question of whether or not ima should go into cap_sys_admin is more a question of capability semantics then it is in how exactly ima is namespaced. We do have agreed before that overloading cap_sys_admin further isn't ideal. Often we end up rectifying that mistake later. For example, how we moved stuff like criu, bpf, and perf to their own capability. Now we're left with stuff like: static inline bool perfmon_capable(void) { return capable(CAP_PERFMON) || capable(CAP_SYS_ADMIN); } static inline bool bpf_capable(void) { return capable(CAP_BPF) || capable(CAP_SYS_ADMIN); } static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns) { return ns_capable(ns, CAP_CHECKPOINT_RESTORE) || ns_capable(ns, CAP_SYS_ADMIN); } for the sake of adhering to legacy behavior. I think we can skip over that mistake and introduce cap_sys_integrity. Christian