Re: [RFC PATCH 3/3] overlay: Add the ability to remount volatile directories when safe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 16, 2020 at 11:31:20AM +0200, Amir Goldstein wrote:
> On Mon, Nov 16, 2020 at 6:58 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> >
> > Overlayfs added the ability to setup mounts where all syncs could be
> > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile").
> >
> > A user might want to remount this fs, but we do not let the user because
> > of the "incompat" detection feature. In the case of volatile, it is safe
> > to do something like[1]:
> >
> > $ sync -f /root/upperdir
> > $ rm -rf /root/workdir/incompat/volatile
> >
> > There are two ways to go about this. You can call sync on the underlying
> > filesystem, check the error code, and delete the dirty file if everything
> > is clean. If you're running lots of containers on the same filesystem, or
> > you want to avoid all unnecessary I/O, this may be suboptimal.
> >
> > Alternatively, you can blindly delete the dirty file, and "hope for the
> > best".
> >
> > This patch introduces transparent functionality to check if it is
> > (relatively) safe to reuse the upperdir. It ensures that the filesystem
> > hasn't been remounted, the system hasn't been rebooted, nor has the
> > overlayfs code changed. It also checks the errseq on the superblock
> > indicating if there have been any writeback errors since the previous
> > mount. Currently, this information is not directly exposed to userspace, so
> > the user cannot make decisions based on this.
> 
> This is the main reason IMO that this patch is needed, but it's buried inside
> this paragraph. It wasn't obvious to me at first why userspace solution
> was not possible. Maybe try to give it more focus.
> 
> 
> > Instead we checkpoint
> > this information to disk, and upon remount we see if any of it has
> > changed. Since the structure is explicitly not meant to be used
> > between different versions of the code, its stability does not
> > matter so much.
> >
> > [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@xxxxxxxxxxxxxx/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c
> >
> > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx>
> > Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> > Cc: linux-unionfs@xxxxxxxxxxxxxxx
> > Cc: Miklos Szeredi <miklos@xxxxxxxxxx>
> > Cc: Amir Goldstein <amir73il@xxxxxxxxx>
> > ---
> >  Documentation/filesystems/overlayfs.rst |  5 +-
> >  fs/overlayfs/overlayfs.h                | 34 ++++++++++
> >  fs/overlayfs/readdir.c                  | 86 +++++++++++++++++++++++--
> >  fs/overlayfs/super.c                    | 22 ++++++-
> >  4 files changed, 139 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst
> > index 580ab9a0fe31..fa3faeeab727 100644
> > --- a/Documentation/filesystems/overlayfs.rst
> > +++ b/Documentation/filesystems/overlayfs.rst
> > @@ -581,7 +581,10 @@ checks for this directory and refuses to mount if present. This is a strong
> >  indicator that user should throw away upper and work directories and create
> >  fresh one. In very limited cases where the user knows that the system has
> >  not crashed and contents of upperdir are intact, The "volatile" directory
> > -can be removed.
> > +can be removed.  In certain cases it the filesystem can detect that the
> 
> typo: it the filesystem
> 
> > +upperdir can be reused safely, and it will not require the user to
> > +manually delete the volatile directory.
> > +
> >
> >  Testsuite
> >  ---------
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 9eb911f243e1..980d2c930f7a 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -30,6 +30,11 @@ enum ovl_path_type {
> >  #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink"
> >  #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper"
> >  #define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy"
> > +#define OVL_XATTR_VOLATILE OVL_XATTR_PREFIX "volatile"
> > +
> > +#define OVL_INCOMPATDIR_NAME "incompat"
> > +#define OVL_VOLATILEDIR_NAME "volatile"
> > +#define OVL_VOLATILE_DIRTY_NAME "dirty"
> >
> >  enum ovl_inode_flag {
> >         /* Pure upper dir that may contain non pure upper entries */
> > @@ -54,6 +59,32 @@ enum {
> >         OVL_XINO_ON,
> >  };
> >
> > +/*
> > + * This is copied into the volatile xattr, and the user does not interact with
> > + * it. There is no stability requirement, as a reboot explicitly invalidates
> > + * a volatile workdir. It is explicitly meant not to be a stable api.
> > + *
> > + * Although this structure isn't meant to be stable it is exposed to potentially
> > + * unprivileged users. We don't do any kind of cryptographic operations with
> > + * the structure, so it could be tampered with, or inspected. Don't put
> > + * kernel memory pointers in it, or anything else that could cause problems,
> > + * or information disclosure.
> > + */
> > +struct overlay_volatile_info {
> 
> ovl_volatile_info please
> 
> > +       /*
> > +        * This uniquely identifies a boot, and is reset if overlayfs itself
> > +        * is reloaded. Therefore we check our current / known boot_id
> > +        * against this before looking at any other fields to validate:
> > +        * 1. Is this datastructure laid out in the way we expect? (Overlayfs
> > +        *    module, reboot, etc...)
> > +        * 2. Could something have changed (like the s_instance_id counter
> > +        *    resetting)
> > +        */
> > +       uuid_t          overlay_boot_id;
> 
> ovl_boot_id
> 
> > +       u64             s_instance_id;
> > +       errseq_t        errseq; /* Just a u32 */
> > +} __packed;
> > +
> >  /*
> >   * The tuple (fh,uuid) is a universal unique identifier for a copy up origin,
> >   * where:
> > @@ -501,3 +532,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
> >
> >  /* export.c */
> >  extern const struct export_operations ovl_export_operations;
> > +
> > +/* super.c */
> > +extern uuid_t overlay_boot_id;
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index f8cc15533afa..ee0d2b88a19c 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1054,7 +1054,84 @@ int ovl_check_d_type_supported(struct path *realpath)
> >         return rdd.d_type_supported;
> >  }
> >
> > -#define OVL_INCOMPATDIR_NAME "incompat"
> > +static int ovl_check_incompat_volatile(struct ovl_cache_entry *p,
> > +                                      struct path *path)
> > +{
> > +       int err, ret = -EINVAL;
> > +       struct overlay_volatile_info info;
> > +       struct dentry *d_volatile, *d_dirty;
> > +
> > +       d_volatile = lookup_one_len(p->name, path->dentry, p->len);
> > +       if (IS_ERR(d_volatile))
> > +               return PTR_ERR(d_volatile);
> > +
> > +       inode_lock_nested(d_volatile->d_inode, I_MUTEX_PARENT);
> 
> You can't do this. I_MUTEX_PARENT level is already taken on parent
> and you also don't need to perform lookup in this helper. I will explain below.
> 
> > +       d_dirty = lookup_one_len(OVL_VOLATILE_DIRTY_NAME, d_volatile,
> > +                                strlen(OVL_VOLATILE_DIRTY_NAME));
> > +       if (IS_ERR(d_dirty)) {
> > +               err = PTR_ERR(d_dirty);
> > +               if (err != -ENOENT)
> > +                       ret = err;
> > +               goto out_putvolatile;
> > +       }
> > +
> > +       if (!d_dirty->d_inode)
> > +               goto out_putdirty;
> > +
> > +       inode_lock_nested(d_dirty->d_inode, I_MUTEX_XATTR);
> 
> What's this lock for?
> 
I need to take a lock on this inode to prevent modifications to it, right, or is
getting the xattr safe?

> > +       err = ovl_do_getxattr(d_dirty, OVL_XATTR_VOLATILE, &info, sizeof(info));
> > +       inode_unlock(d_dirty->d_inode);
> > +       if (err != sizeof(info))
> > +               goto out_putdirty;
> > +
> > +       if (!uuid_equal(&overlay_boot_id, &info.overlay_boot_id)) {
> > +               pr_debug("boot id has changed (reboot or module reloaded)\n");
> > +               goto out_putdirty;
> > +       }
> > +
> > +       if (d_dirty->d_inode->i_sb->s_instance_id != info.s_instance_id) {
> > +               pr_debug("workdir has been unmounted and remounted\n");
> > +               goto out_putdirty;
> > +       }
> > +
> > +       err = errseq_check(&d_dirty->d_inode->i_sb->s_wb_err, info.errseq);
> > +       if (err) {
> > +               pr_debug("workdir dir has experienced errors: %d\n", err);
> > +               goto out_putdirty;
> > +       }
> 
> Please put all the above including getxattr in helper ovl_verify_volatile_info()
> 
Is it okay if the helper stays in super.c?


> > +
> > +       /* Dirty file is okay, delete it. */
> > +       ret = ovl_do_unlink(d_volatile->d_inode, d_dirty);
> 
> That's a problem. By doing this, you have now approved a regular overlay
> re-mount, but you need only approve a volatile overlay re-mount.
> Need to pass ofs to ovl_workdir_cleanup{,_recurse}.
> 
I can add a check to make sure this behaviour is only allowed on remounts back 
into volatile. There's technically a race condition here, where if there
is an error between this check, and the mounting being finished, the FS
could be dirty, but that already exists with the impl today.

> > +
> > +out_putdirty:
> > +       dput(d_dirty);
> > +out_putvolatile:
> > +       inode_unlock(d_volatile->d_inode);
> > +       dput(d_volatile);
> > +       return ret;
> > +}
> > +
> > +/*
> > + * check_incompat checks this specific incompat entry for incompatibility.
> > + * If it is found to be incompatible -EINVAL will be returned.
> > + *
> > + * Any other -errno indicates an unknown error, and filesystem mounting
> > + * should be aborted.
> > + */
> > +static int ovl_check_incompat(struct ovl_cache_entry *p, struct path *path)
> > +{
> > +       int err = -EINVAL;
> > +
> > +       if (!strcmp(p->name, OVL_VOLATILEDIR_NAME))
> > +               err = ovl_check_incompat_volatile(p, path);
> > +
> > +       if (err == -EINVAL)
> > +               pr_err("incompat feature '%s' cannot be mounted\n", p->name);
> > +       else
> > +               pr_debug("incompat '%s' handled: %d\n", p->name, err);
> > +
> > +       return err;
> > +}
> >
> >  static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> >  {
> > @@ -1098,10 +1175,9 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level)
> >                         if (p->len == 2 && p->name[1] == '.')
> >                                 continue;
> >                 } else if (incompat) {
> > -                       pr_err("overlay with incompat feature '%s' cannot be mounted\n",
> > -                               p->name);
> > -                       err = -EINVAL;
> > -                       break;
> > +                       err = ovl_check_incompat(p, path);
> > +                       if (err)
> > +                               break;
> 
> The call to ovl_check_incompat here is too soon and it makes
> you need to lookup both the volatile dir and dirty file.
> What you need to do and let cleanup recurse into the next level
> while letting it know that we are now traversing the "incompat"
> subtree.
> 
Maybe a dumb question but why is it incompat/volatile/dirty, rather than just 
incompat/volatile, where volatile is a file? Are there any caveats with putting
the xattr on the directory, or alternatively are there any reasons not to make
the structure incompat/volatile/dirty?

> You can see a more generic implementation I once made here:
> https://github.com/amir73il/linux/blob/ovl-features/fs/overlayfs/readdir.c#L1051
> but it should be simpler with just incompat/volatile.
> Perhaps something like this:
> 
>                 dentry = lookup_one_len(p->name, path->dentry, p->len);
>                 if (IS_ERR(dentry))
>                         continue;
> -               if (dentry->d_inode)
> +               if (dentry->d_inode && d_is_dir(dentry) && incompat)
> +                       err = ovl_incompatdir_cleanup(dir, path->mnt, dentry);
> +               else if (dentry->d_inode)
>                         err = ovl_workdir_cleanup(dir, path->mnt,
> dentry, level);
>                 dput(dentry);
> 
> Then inside ovl_incompatdir_cleanup() you can call ovl_check_incompat()
> with dentry argument.
> 
> Now you have a few options. A simple option would be to put the volatile
> xattr on the volatile dir instead of on the dirty file.
> If you do that, you can call ovl_verify_volatile_info() on the volatile dentry
> without any lookups (only on a volatile re-mount) and if the volatile dir is
> approved for reuse, you don't even need to remove the dirty file, because
> it's just going to be re-created anyway.
> 
> 
> >                 }
> >                 dentry = lookup_one_len(p->name, path->dentry, p->len);
> >                 if (IS_ERR(dentry))
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index 2ee0ba16cc7b..94980898009f 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/seq_file.h>
> >  #include <linux/posix_acl_xattr.h>
> >  #include <linux/exportfs.h>
> > +#include <linux/uuid.h>
> >  #include "overlayfs.h"
> >
> >  MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>");
> > @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL");
> >
> >
> >  struct ovl_dir_cache;
> > +uuid_t overlay_boot_id;
> 
> ovl_boot_id please.
> 
> >
> >  #define OVL_MAX_STACK 500
> >
> > @@ -1246,20 +1248,35 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
> >   */
> >  static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
> >  {
> > +       int err;
> >         unsigned int ctr;
> >         struct dentry *d = dget(ofs->workbasedir);
> >         static const char *const volatile_path[] = {
> > -               OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
> > +               OVL_WORKDIR_NAME,
> > +               OVL_INCOMPATDIR_NAME,
> > +               OVL_VOLATILEDIR_NAME,
> > +               OVL_VOLATILE_DIRTY_NAME,
> >         };
> >         const char *const *name = volatile_path;
> > +       struct overlay_volatile_info info = {};
> >
> >         for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) {
> >                 d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG);
> >                 if (IS_ERR(d))
> >                         return PTR_ERR(d);
> >         }
> > +
> > +       uuid_copy(&info.overlay_boot_id, &overlay_boot_id);
> > +       info.s_instance_id = d->d_inode->i_sb->s_instance_id;
> > +       info.errseq = errseq_sample(&d->d_inode->i_sb->s_wb_err);
> > +
> > +
> > +       err = ovl_do_setxattr(d, OVL_XATTR_VOLATILE, &info, sizeof(info), 0);
> > +       if (err == -EOPNOTSUPP)
> > +               err = 0;
> > +
> 
> Please put all the above in helper ovl_set_volatile_info()
> 
> Thanks,
> Amir.
Thank you for the fast review.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux