On Thu, Dec 7, 2023 at 4:43 PM Seth Forshee (DigitalOcean) <sforshee@xxxxxxxxxx> wrote: > > [Adding Mimi for insights on EVM questions] > > On Fri, Dec 01, 2023 at 12:18:00PM -0600, Seth Forshee (DigitalOcean) wrote: > > On Fri, Dec 01, 2023 at 06:39:18PM +0100, Christian Brauner wrote: > > > > +/** > > > > + * vfs_set_fscaps - set filesystem capabilities > > > > + * @idmap: idmap of the mount the inode was found from > > > > + * @dentry: the dentry on which to set filesystem capabilities > > > > + * @caps: the filesystem capabilities to be written > > > > + * @flags: setxattr flags to use when writing the capabilities xattr > > > > + * > > > > + * This function writes the supplied filesystem capabilities to the dentry. > > > > + * > > > > + * Return: 0 on success, a negative errno on error. > > > > + */ > > > > +int vfs_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > > > > + const struct vfs_caps *caps, int flags) > > > > +{ > > > > + struct inode *inode = d_inode(dentry); > > > > + struct inode *delegated_inode = NULL; > > > > + struct vfs_ns_cap_data nscaps; > > > > + int size, error; > > > > + > > > > + /* > > > > + * Unfortunately EVM wants to have the raw xattr value to compare to > > > > + * the on-disk version, so we need to pass the raw xattr to the > > > > + * security hooks. But we also want to do security checks before > > > > + * breaking leases, so that means a conversion to the raw xattr here > > > > + * which will usually be reduntant with the conversion we do for > > > > + * writing the xattr to disk. > > > > + */ > > > > + size = vfs_caps_to_xattr(idmap, i_user_ns(inode), caps, &nscaps, > > > > + sizeof(nscaps)); > > > > + if (size < 0) > > > > + return size; > > > > > > Oh right, I remember that. Slight eyeroll. See below though... > > > > > > > + > > > > +retry_deleg: > > > > + inode_lock(inode); > > > > + > > > > + error = xattr_permission(idmap, inode, XATTR_NAME_CAPS, MAY_WRITE); > > > > + if (error) > > > > + goto out_inode_unlock; > > > > + error = security_inode_setxattr(idmap, dentry, XATTR_NAME_CAPS, &nscaps, > > > > + size, flags); > > > > + if (error) > > > > + goto out_inode_unlock; > > > > > > For posix acls I added dedicated security hooks that take the struct > > > posix_acl stuff and then plumb that down into the security modules. You > > > could do the same thing here and then just force EVM and others to do > > > their own conversion from in-kernel to xattr format, instead of forcing > > > the VFS to do this. > > > > > > Because right now we make everyone pay the price all the time when > > > really EVM should pay that price and this whole unpleasantness. > > > > Good point, I'll do that. > > I've been reconsidering various approaches here. One thing I noticed is > that for the non-generic case (iow overlayfs) I missed calling > security_inode_post_setxattr(), where EVM also wants the raw xattr, so > that would require another conversion. That got me wondering whether the > setxattr security hooks really matter when writing fscaps to overlayfs. > And it seems like they might not: the LSMs only look for their own > xattrs, and IMA doesn't do anything with fscaps xattrs. EVM does, but > what it does for a xattr write to an overlayfs indoe seems at least > partially if not completely redundant with what it will do when the > xattr is written to the upper filesystem. > > So could we push these security calls down to the generic fscaps > implementations just before/after writing the raw xattr data and just > skip them for overlayfs? If so we can get away with doing the vfs_caps > to xattr conversion only once. > > The trade offs are that filesystems which implement fscaps inode > operations become responsible for calling the security hooks if needed, > and if something changes such that we need to call those security hooks > for fscaps on overlayfs this solution would no longer work. Hi Seth, I was trying to understand the alternative proposals, but TBH, I cannot wrap my head about overlayfs+IMA/EVM and I do not fully understand the use case. Specifically, I do not understand why the IMA/EVM attestation on the upper and lower fs isn't enough to make overlayfs tamper proof. I never got an explanation of the threat model for overlayfs+IMA/EVM. I know that for SELinux and overlayfs a lot of work was done by Vivek. I was not involved in this work, but AKAIF, it did not involve any conversion of selinux xattrs. Thanks, Amir.