Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 16, 2023 at 12:16 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Fri, Dec 15, 2023 at 10:00 PM Vinicius Costa Gomes
> <vinicius.gomes@xxxxxxxxx> wrote:
> >
> > Hi Amir,
> >
> > Amir Goldstein <amir73il@xxxxxxxxx> writes:
> >
> > > +fsdevel because this may be relevant to any subsystem that
> > > keeps a long live cred copy (e.g. nfsd, ksmbd, cachefiles).
> > >
> > > +linus who wrote
> > > d7852fbd0f04 access: avoid the RCU grace period for the temporary
> > > subjective credentials
> > >
> > >
> > > On Fri, Dec 15, 2023 at 12:02 AM Vinicius Costa Gomes
> > > <vinicius.gomes@xxxxxxxxx> wrote:
> > >>
> > >> Permission checks in overlayfs also check against the credentials
> > >> associated with the superblock, which are assigned at mount() time, so
> > >> pretty long lived. So, we can omit the reference counting for this
> > >> case.
> > >
> > > You forgot to mention WHY you are proposing this and to link to the
> > > original report with the first optimization attempt:
> > >
> > > https://lore.kernel.org/linux-unionfs/20231018074553.41333-1-hu1.chen@xxxxxxxxx/
> > >
> >
> > I thought that the "in-reply-to" would do that, I should have been more
> > explicit on the context. Sorry.
> >
> > >>
> > >> This (very early) proof of concept does two things:
> > >>
> > >> Add a flag "immutable" (TODO: find a better name) to struct cred to
> > >> indicate that it is long lived, and that refcount can be omitted.
> > >>
> > >
> > > This reminds me of the many discussions about Rust abstractions
> > > that are going on right now.
> > > I think an abstraction like this one is called a "borrowed reference".
> > >
> >
> > Yeah, very similar to a borrow in rust.
> >
> > >> Add "guard" helpers, so we can use automatic cleanup to be sure
> > >> override/restore are always paired. (I didn't like that I have
> > >> 'ovl_cred' to be a container for the credentials, but couldn't think
> > >> of other solutions)
> > >>
> > >
> > > I like the guard but see comments below...
> > >
> > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
> > >> ---
> > >> Hi Amir,
> > >>
> > >> Just to know if I am more or less on right track.
> > >>
> > >> This is a different attempt, instead of the local copy idea, I am
> > >> using the fact that the credentials associated with the mount() will
> > >> be alive for a long time. I think the result is almost the same. But I
> > >> could be missing something.
> > >>
> > >> TODO:
> > >>  - Add asserts.
> > >>  - Replace ovl_override_creds()/revert_Creds() by
> > >>    ovl_creator_cred()/guard() everywhere.
> > >>  - Probably more.
> > >>
> > >>
> > >>  fs/overlayfs/inode.c     |  7 ++++---
> > >>  fs/overlayfs/overlayfs.h | 18 ++++++++++++++++++
> > >>  fs/overlayfs/params.c    |  4 +++-
> > >>  fs/overlayfs/super.c     | 10 +++++++---
> > >>  fs/overlayfs/util.c      | 10 ++++++++++
> > >>  include/linux/cred.h     | 12 ++++++++++--
> > >>  6 files changed, 52 insertions(+), 9 deletions(-)
> > >>
> > >> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > >> index c63b31a460be..2c016a3bbe2d 100644
> > >> --- a/fs/overlayfs/inode.c
> > >> +++ b/fs/overlayfs/inode.c
> > >> @@ -290,9 +290,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >>                    struct inode *inode, int mask)
> > >>  {
> > >>         struct inode *upperinode = ovl_inode_upper(inode);
> > >> +       struct ovl_cred ovl_cred;
> > >>         struct inode *realinode;
> > >>         struct path realpath;
> > >> -       const struct cred *old_cred;
> > >
> > > Nit: please don't reorder the variable definitions.
> > >
> >
> > Sorry about that. "bad" habits from the networking side :-)
> >
>
> I fully support all forms and shapes of OCD ;-)
>
> > >>         int err;
> > >>
> > >>         /* Careful in RCU walk mode */
> > >> @@ -310,7 +310,9 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >>         if (err)
> > >>                 return err;
> > >>
> > >> -       old_cred = ovl_override_creds(inode->i_sb);
> > >> +       ovl_cred = ovl_creator_cred(inode->i_sb);
> > >> +       guard(ovl_creds)(&ovl_cred);
> > >> +
> > >>         if (!upperinode &&
> > >>             !special_file(realinode->i_mode) && mask & MAY_WRITE) {
> > >>                 mask &= ~(MAY_WRITE | MAY_APPEND);
> > >> @@ -318,7 +320,6 @@ int ovl_permission(struct mnt_idmap *idmap,
> > >>                 mask |= MAY_READ;
> > >>         }
> > >>         err = inode_permission(mnt_idmap(realpath.mnt), realinode, mask);
> > >> -       revert_creds(old_cred);
> > >>
> > >>         return err;
> > >>  }
> > >> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > >> index 05c3dd597fa8..22ea3066376e 100644
> > >> --- a/fs/overlayfs/overlayfs.h
> > >> +++ b/fs/overlayfs/overlayfs.h
> > >> @@ -416,6 +416,24 @@ static inline int ovl_do_getattr(const struct path *path, struct kstat *stat,
> > >>         return vfs_getattr(path, stat, request_mask, flags);
> > >>  }
> > >>
> > >> +struct ovl_cred {
> > >> +       const struct cred *cred;
> > >> +};
> > >> +
> > >> +static inline struct ovl_cred ovl_creator_cred(struct super_block *sb)
> > >> +{
> > >> +       struct ovl_fs *ofs = OVL_FS(sb);
> > >> +
> > >> +       return (struct ovl_cred) { .cred = ofs->creator_cred };
> > >> +}
> > >> +
> > >> +void ovl_override_creds_new(struct ovl_cred *creator_cred);
> > >> +void ovl_revert_creds_new(struct ovl_cred *creator_cred);
> > >> +
> > >> +DEFINE_GUARD(ovl_creds, struct ovl_cred *,
> > >> +            ovl_override_creds_new(_T),
> > >> +            ovl_revert_creds_new(_T));
> > >> +
> > >
> > > This pattern is not unique to overlayfs.
> > > It is probably better to define a common container type struct override_cred
> > > in cred.h/cred.c that other code could also use.
> > >
> >
> > Good idea.
> >
> > >>  /* util.c */
> > >>  int ovl_get_write_access(struct dentry *dentry);
> > >>  void ovl_put_write_access(struct dentry *dentry);
> > >> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> > >> index 3fe2dde1598f..008377b9241a 100644
> > >> --- a/fs/overlayfs/params.c
> > >> +++ b/fs/overlayfs/params.c
> > >> @@ -770,8 +770,10 @@ void ovl_free_fs(struct ovl_fs *ofs)
> > >>         kfree(ofs->config.lowerdirs);
> > >>         kfree(ofs->config.upperdir);
> > >>         kfree(ofs->config.workdir);
> > >> -       if (ofs->creator_cred)
> > >> +       if (ofs->creator_cred) {
> > >> +               cred_set_immutable(ofs->creator_cred, false);
> > >>                 put_cred(ofs->creator_cred);
> > >
> > > Not happy about this API.
> > >
> > > Two solutions I can think of:
> > > 1. (my preference) keep two copies of creator_cred, one refcounted copy
> > >     and one non-refcounted that is used for override_creds()
> > > 2. put_cred_ref() which explicitly opts-in to dropping refcount on
> > >     a borrowed reference, same as you do above but hidden behind
> > >     a properly documented helper
> > >
> >
> > Probably because I already have option (2) more or less understood, but
> > I think that having a single creator_cred marked as
> > "non-refcounted/long-lived" is simpler than having two copies, even the
> > the extra copy only exists for the duration of the override.
> >
> > But it could be that I still can't imagine what you have in mind about
> > (1).
> >
>
> (1) is actually quite similar to your first proposal of copying creator_cred
> to a local stack variable.
> My only concern about this proposal was that the core code was not aware
> of the fact that it is working with an object that must not be freed.
> And this is where non_refcount comes in to help.
>
> With (2) prepare_creds_ref()/put_cred_ref() can be used to manage the
> lifetime of a long-lived cred object from the cred_jar memory pool.
>
> With (1) the caller is responsible to manage the lifetime of the non-refcounted
> copy, for example, if the object lives on the stack.
>
> For overlayfs this could mean, either create a non-refcounted copy on the
> stack in every operator using ovl_creator_cred() helper as your first proposal
> did, or keep another non-refcounted copy in ofs->override_cred, which gets
> allocated/freed by overlayfs (i.e. kmalloc/kfree).
>
> > >> +       }
> > >>         kfree(ofs);
> > >>  }
> > >>
> > >> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > >> index a0967bb25003..1ffb4f0f8186 100644
> > >> --- a/fs/overlayfs/super.c
> > >> +++ b/fs/overlayfs/super.c
> > >> @@ -1304,6 +1304,13 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
> > >>         if (!cred)
> > >>                 goto out_err;
> > >>
> > >> +       /* Never override disk quota limits or use reserved space */
> > >> +       cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
> > >> +       /* The cred that is going to be associated with the super
> > >> +        * block will not change.
> > >> +        */
> > >> +       cred_set_immutable(cred, true);
> > >> +
> > >
> > > Likewise, either:
> > > 1. Create a non-refcounted copy of creator_cred
> >
> > Ah! I think now I see what you meant. Not sure I like, I think it's a
> > bit too error prone, it's hard to enforce that the copies would be kept
> > in sync in general. (even if in practice the only thing that would need
> > to be kept in sync is the destruction of both, at least for now).
> >
>
> I agree that it makes more sense to create the non-refcounted copy
> on the stack.
>
> As a matter of fact, maybe it makes sense to embed a non-refcounted
> copy in the struct used for the guard:
>
> struct override_cred {
>        struct cred copy;
>        const struct cred *cred;
> };
>
> Then override_cred_save() can create the non-refcounted copy:
>
> void override_cred_save(struct override_cred *override)
> {
>        override->copy = *override;
>        override->copy.non_refcount = 1;
>        override->cred = override_creds(&override->copy);
>        override->cred = override_creds(override->cred);

This second line is leftover of course...

> }
>
> > > or
> > > 2. Use a documented helper prepare_creds_ref() to hide
> > >     this implementation detail
> >
> > This I like more, having the properties documented in the constructor.
> > And much better than my _set_immutable().
> >
>
> Yes, the important thing is that an object cannot change
> its non_refcount property during its lifetime -

... which means that put_creds_ref() should assert that
there is only a single refcount - the one handed out by
prepare_creds_ref() before removing non_refcount or
directly freeing the cred object.

I must say that the semantics of making a non-refcounted copy
to an object whose lifetime is managed by the caller sounds a lot
less confusing to me.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux