On Fri, Sep 23, 2022 at 04:59:42PM +0200, Miklos Szeredi wrote: > On Thu, 22 Sept 2022 at 17:18, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > The current way of setting and getting posix acls through the generic > > xattr interface is error prone and type unsafe. The vfs needs to > > interpret and fixup posix acls before storing or reporting it to > > userspace. Various hacks exist to make this work. The code is hard to > > understand and difficult to maintain in it's current form. Instead of > > making this work by hacking posix acls through xattr handlers we are > > building a dedicated posix acl api around the get and set inode > > operations. This removes a lot of hackiness and makes the codepaths > > easier to maintain. A lot of background can be found in [1]. > > > > In order to build a type safe posix api around get and set acl we need > > all filesystem to implement get and set acl. > > > > Now that we have added get and set acl inode operations that allow easy > > access to the dentry we give overlayfs it's own get and set acl inode > > operations. > > > > Since overlayfs is a stacking filesystem it will use the newly added > > posix acl api when retrieving posix acls from the relevant layer. > > > > Since overlayfs can also be mounted on top of idmapped layers. If > > idmapped layers are used overlayfs must take the layer's idmapping into > > account after it retrieved the posix acls from the relevant layer. > > > > Note, until the vfs has been switched to the new posix acl api this > > patch is a non-functional change. > > > > Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@xxxxxxxxxx [1] > > Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx> > > --- > > fs/overlayfs/dir.c | 3 +- > > fs/overlayfs/inode.c | 63 ++++++++++++++++++++++++++++++++++++---- > > fs/overlayfs/overlayfs.h | 10 +++++-- > > 3 files changed, 67 insertions(+), 9 deletions(-) > > > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > > index 7bece7010c00..eb49d5d7b56f 100644 > > --- a/fs/overlayfs/dir.c > > +++ b/fs/overlayfs/dir.c > > @@ -1311,7 +1311,8 @@ const struct inode_operations ovl_dir_inode_operations = { > > .permission = ovl_permission, > > .getattr = ovl_getattr, > > .listxattr = ovl_listxattr, > > - .get_inode_acl = ovl_get_acl, > > + .get_inode_acl = ovl_get_inode_acl, > > + .get_acl = ovl_get_acl, > > .update_time = ovl_update_time, > > .fileattr_get = ovl_fileattr_get, > > .fileattr_set = ovl_fileattr_set, > > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c > > index ecb51c249466..dd11e13cd288 100644 > > --- a/fs/overlayfs/inode.c > > +++ b/fs/overlayfs/inode.c > > @@ -14,6 +14,8 @@ > > #include <linux/fileattr.h> > > #include <linux/security.h> > > #include <linux/namei.h> > > +#include <linux/posix_acl.h> > > +#include <linux/posix_acl_xattr.h> > > #include "overlayfs.h" > > > > > > @@ -460,9 +462,9 @@ ssize_t ovl_listxattr(struct dentry *dentry, char *list, size_t size) > > * of the POSIX ACLs retrieved from the lower layer to this function to not > > * alter the POSIX ACLs for the underlying filesystem. > > */ > > -static void ovl_idmap_posix_acl(struct inode *realinode, > > - struct user_namespace *mnt_userns, > > - struct posix_acl *acl) > > +void ovl_idmap_posix_acl(struct inode *realinode, > > + struct user_namespace *mnt_userns, > > + struct posix_acl *acl) > > { > > struct user_namespace *fs_userns = i_user_ns(realinode); > > > > @@ -495,7 +497,7 @@ static void ovl_idmap_posix_acl(struct inode *realinode, > > * > > * This is obviously only relevant when idmapped layers are used. > > */ > > -struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) > > +struct posix_acl *ovl_get_inode_acl(struct inode *inode, int type, bool rcu) > > { > > struct inode *realinode = ovl_inode_real(inode); > > struct posix_acl *acl, *clone; > > @@ -547,6 +549,53 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type, bool rcu) > > posix_acl_release(acl); > > return clone; > > } > > + > > +static struct posix_acl *ovl_get_acl_path(const struct path *path, > > + const char *acl_name) > > +{ > > + struct posix_acl *real_acl, *clone; > > + struct user_namespace *mnt_userns; > > + > > + mnt_userns = mnt_user_ns(path->mnt); > > + > > + real_acl = vfs_get_acl(mnt_userns, path->dentry, acl_name); > > + if (IS_ERR(real_acl)) > > + return real_acl; > > + if (!real_acl) > > + return NULL; > > if (IS_ERR_OR_NULL(real_acl)) > return real_acl; Thanks. > > > + > > + if (!is_idmapped_mnt(path->mnt)) > > + return real_acl; > > + > > + /* > > + * We cannot alter the ACLs returned from the relevant layer as that > > + * would alter the cached values filesystem wide for the lower > > + * filesystem. Instead we can clone the ACLs and then apply the > > + * relevant idmapping of the layer. > > + */ > > Can't vfs_get_acl() return 'const posix_acl *' to enforce that? The problem is that struct posix_acl is reference counted and often has to be passed to functions such as posix_acl_release() or posix_acl_dup().