Re: [RFC PATCH] vfs: fs{set,get}attr iops

Miklos Szeredi <miklos@xxxxxxxxxx> · Tue, 24 Nov 2020 09:28:07 +0100

On Mon, Nov 23, 2020 at 10:06 PM Andreas Dilger <adilger@xxxxxxxxx> wrote:
>
> On Nov 23, 2020, at 7:12 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >
> > Dmitry found an issue with overlayfs's FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR
> > implementation:
> >
> >  https://lore.kernel.org/linux-unionfs/CACT4Y+bUfavwMVv2SEMve5pabE_AwsDO0YsRBGZtYqX59a77vA@xxxxxxxxxxxxxx/
> >
> > I think the only proper soltuion is to move these into inode operations, which
> > should be a good cleanup as well.
> >
> > This is a first cut, the FS_IOC_SETFLAGS conversion is not complete, and only
> > lightly tested on ext4 and xfs.
> >
> > There are minor changes in behavior, like the exact errno value in case of
> > multiple error conditions.
> >
> > 34 files changed, 668 insertions(+), 1170 deletions(-)
>
>
> Hi Miklos,
> this looks like a good reduction in code duplication (-500 LOC is nice).
>
> One issues I have with this patch is that it spreads the use of "fsxattr"
> asthe name for these inode flags further into the VFS.  That was inherited
> from XFS because of the ioctl name, but is very confusing with "real"
> extended attributes, also using get/setxattr names but totally differently.
>
> It would be better to use function/variable names with "xflags" and "iflags"
> that are already used in several filesystems to separate this from xattrs.

Makes sense.

>
> It wouldn't be terrible to also rename the ioctl to FS_IOC_FSSETXFLAGS and
> keep a #define for FS_IOC_FSSETXATTR for compatibility, but that is less
> critical if that is now only used in one place in the code.
>
> Some more comments inline...
>
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > +/*
> > + * Generic function to check FS_IOC_FSSETXATTR/FS_IOC_SETFLAGS values and reject
> > + * any invalid configurations.
> > + *
> > + * Note: must be called with inode lock held.
> > + */
> > +static int fssetxattr_prepare(struct inode *inode,
> > +                           const struct kfsxattr *old_fa,
> > +                           struct kfsxattr *fa)
>
> > +{
> > +       /*
> > +        * The IMMUTABLE and APPEND_ONLY flags can only be changed by
> > +        * the relevant capability.
> > +     */
> > +     if ((fa->fsx_flags ^ old_fa->fsx_flags) & (FS_APPEND_FL | FS_IMMUTABLE_FL) &&
> > +         !capable(CAP_LINUX_IMMUTABLE))
> > +             return -EPERM;
> > +
> > +     if (fa->flags_valid)
> > +             return fscrypt_prepare_setflags(inode, old_fa->fsx_flags, fa->fsx_flags);
>
> This doesn't seem right?  It means if iflags are set, the rest of the checks are
> skipped *even if* there are no problems with the fscrypt flags?  That bypasses
> the DAX and PROJINHERIT checks later in this function, but it is also possible to
> set/clear those flags via FS_IOC_SETFLAGS, and is not very obvious for the code
> flow.  I'd think this should be something more like:

>
>         if (IS_ENCRYPTED(inode) && fa->flags_valid) {
>                 rc = fscrypt_prepare_setflags(...);
>                 if (rc)
>                         return rc;
>         }
>
> and continue on to the rest of the checks, and maybe skip the xflags-specific
> checks (EXTSIZE, EXTSZINHERIT, COWEXTSIZE) if xflags_valid is not set, though
> they would just be no-ops in that case since the iflags interface will not
> set those flags.
>
> Alternately, move the DAX and PROJINHERIT checks above "flags_valid", but add
> a comment that the remaining checks are only for xflags-specific values.

Good point.   I wasn't actually looking at the code, just converting
the old one to the new, and apparently the DAX and PROJINHERIT checks
were missing for the FS_IOC_SETFLAGS case (see
vfs_ioc_setflags_prepare).

> > +int vfs_fssetxattr(struct dentry *dentry, struct kfsxattr *fa)
> > +{
> > +     struct inode *inode = d_inode(dentry);
> > +     struct kfsxattr old_fa;
> > +     int err;
> > +
> > +     if (d_is_special(dentry))
> > +             return -ENOTTY;
> > +
> > +     if (!inode->i_op->fssetxattr)
> > +             return -ENOIOCTLCMD;
> > +
> > +     if (!inode_owner_or_capable(inode))
> > +             return -EPERM;
> > +
> > +     inode_lock(inode);
> > +     err = vfs_fsgetxattr(dentry, &old_fa);
> > +     if (!err) {
> > +             /* initialize missing bits from old_fa */
> > +             if (fa->flags_valid) {
> > +                     fa->fsx_xflags |= old_fa.fsx_xflags & ~FS_XFLAG_COMMON;
> > +                     fa->fsx_extsize = old_fa.fsx_extsize;
> > +                     fa->fsx_nextents = old_fa.fsx_nextents;
> > +                     fa->fsx_projid = old_fa.fsx_projid;
> > +                     fa->fsx_cowextsize = old_fa.fsx_cowextsize;
> > +             } else {
> > +                     fa->fsx_flags |= old_fa.fsx_flags & ~FS_COMMON_FL;
> > +             }
>
> This extra call to vfs_fsgetxattr() is adding pure overhead for the case of
> FS_IOC_GETFLAGS and is totally unnecessary.  If iflags_valid is set, then
> none of these other fields should be accessed in the ->fssetxattr() method,
> and they can check for iflags_valid vs. xflags_valid themselves to see which
> ioctl is being called and only access fields which are valid.

The extra call is needed for fssetxattr_prepare() for the the checks
anyway.   Filling in the other fields is a bonus that makes fs code
simpler in some cases (e.g. xfs which mainly just looks at xflags).

Also ->fsgetxattr() is cheap for all filesystems (and this syscall
shouldn't be performance sensitive anyway) so I don't see why we'd
need to optimize this at all.

Thanks,
Miklos