On Mon, Nov 16, 2020 at 11:31:20AM +0200, Amir Goldstein wrote: > On Mon, Nov 16, 2020 at 6:58 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote: > > > > Overlayfs added the ability to setup mounts where all syncs could be > > short-circuted in (2a99ddacee43: ovl: provide a mount option "volatile"). > > > > A user might want to remount this fs, but we do not let the user because > > of the "incompat" detection feature. In the case of volatile, it is safe > > to do something like[1]: > > > > $ sync -f /root/upperdir > > $ rm -rf /root/workdir/incompat/volatile > > > > There are two ways to go about this. You can call sync on the underlying > > filesystem, check the error code, and delete the dirty file if everything > > is clean. If you're running lots of containers on the same filesystem, or > > you want to avoid all unnecessary I/O, this may be suboptimal. > > > > Alternatively, you can blindly delete the dirty file, and "hope for the > > best". > > > > This patch introduces transparent functionality to check if it is > > (relatively) safe to reuse the upperdir. It ensures that the filesystem > > hasn't been remounted, the system hasn't been rebooted, nor has the > > overlayfs code changed. It also checks the errseq on the superblock > > indicating if there have been any writeback errors since the previous > > mount. Currently, this information is not directly exposed to userspace, so > > the user cannot make decisions based on this. > > This is the main reason IMO that this patch is needed, but it's buried inside > this paragraph. It wasn't obvious to me at first why userspace solution > was not possible. Maybe try to give it more focus. > > > > Instead we checkpoint > > this information to disk, and upon remount we see if any of it has > > changed. Since the structure is explicitly not meant to be used > > between different versions of the code, its stability does not > > matter so much. > > > > [1]: https://lore.kernel.org/linux-unionfs/CAOQ4uxhKr+j5jFyEC2gJX8E8M19mQ3CqdTYaPZOvDQ9c0qLEzw@xxxxxxxxxxxxxx/T/#m6abe713e4318202ad57f301bf28a414e1d824f9c > > > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > Cc: linux-fsdevel@xxxxxxxxxxxxxxx > > Cc: linux-unionfs@xxxxxxxxxxxxxxx > > Cc: Miklos Szeredi <miklos@xxxxxxxxxx> > > Cc: Amir Goldstein <amir73il@xxxxxxxxx> > > --- > > Documentation/filesystems/overlayfs.rst | 5 +- > > fs/overlayfs/overlayfs.h | 34 ++++++++++ > > fs/overlayfs/readdir.c | 86 +++++++++++++++++++++++-- > > fs/overlayfs/super.c | 22 ++++++- > > 4 files changed, 139 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst > > index 580ab9a0fe31..fa3faeeab727 100644 > > --- a/Documentation/filesystems/overlayfs.rst > > +++ b/Documentation/filesystems/overlayfs.rst > > @@ -581,7 +581,10 @@ checks for this directory and refuses to mount if present. This is a strong > > indicator that user should throw away upper and work directories and create > > fresh one. In very limited cases where the user knows that the system has > > not crashed and contents of upperdir are intact, The "volatile" directory > > -can be removed. > > +can be removed. In certain cases it the filesystem can detect that the > > typo: it the filesystem > > > +upperdir can be reused safely, and it will not require the user to > > +manually delete the volatile directory. > > + > > > > Testsuite > > --------- > > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > > index 9eb911f243e1..980d2c930f7a 100644 > > --- a/fs/overlayfs/overlayfs.h > > +++ b/fs/overlayfs/overlayfs.h > > @@ -30,6 +30,11 @@ enum ovl_path_type { > > #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink" > > #define OVL_XATTR_UPPER OVL_XATTR_PREFIX "upper" > > #define OVL_XATTR_METACOPY OVL_XATTR_PREFIX "metacopy" > > +#define OVL_XATTR_VOLATILE OVL_XATTR_PREFIX "volatile" > > + > > +#define OVL_INCOMPATDIR_NAME "incompat" > > +#define OVL_VOLATILEDIR_NAME "volatile" > > +#define OVL_VOLATILE_DIRTY_NAME "dirty" > > > > enum ovl_inode_flag { > > /* Pure upper dir that may contain non pure upper entries */ > > @@ -54,6 +59,32 @@ enum { > > OVL_XINO_ON, > > }; > > > > +/* > > + * This is copied into the volatile xattr, and the user does not interact with > > + * it. There is no stability requirement, as a reboot explicitly invalidates > > + * a volatile workdir. It is explicitly meant not to be a stable api. > > + * > > + * Although this structure isn't meant to be stable it is exposed to potentially > > + * unprivileged users. We don't do any kind of cryptographic operations with > > + * the structure, so it could be tampered with, or inspected. Don't put > > + * kernel memory pointers in it, or anything else that could cause problems, > > + * or information disclosure. > > + */ > > +struct overlay_volatile_info { > > ovl_volatile_info please > > > + /* > > + * This uniquely identifies a boot, and is reset if overlayfs itself > > + * is reloaded. Therefore we check our current / known boot_id > > + * against this before looking at any other fields to validate: > > + * 1. Is this datastructure laid out in the way we expect? (Overlayfs > > + * module, reboot, etc...) > > + * 2. Could something have changed (like the s_instance_id counter > > + * resetting) > > + */ > > + uuid_t overlay_boot_id; > > ovl_boot_id > > > + u64 s_instance_id; > > + errseq_t errseq; /* Just a u32 */ > > +} __packed; > > + > > /* > > * The tuple (fh,uuid) is a universal unique identifier for a copy up origin, > > * where: > > @@ -501,3 +532,6 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower, > > > > /* export.c */ > > extern const struct export_operations ovl_export_operations; > > + > > +/* super.c */ > > +extern uuid_t overlay_boot_id; > > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c > > index f8cc15533afa..ee0d2b88a19c 100644 > > --- a/fs/overlayfs/readdir.c > > +++ b/fs/overlayfs/readdir.c > > @@ -1054,7 +1054,84 @@ int ovl_check_d_type_supported(struct path *realpath) > > return rdd.d_type_supported; > > } > > > > -#define OVL_INCOMPATDIR_NAME "incompat" > > +static int ovl_check_incompat_volatile(struct ovl_cache_entry *p, > > + struct path *path) > > +{ > > + int err, ret = -EINVAL; > > + struct overlay_volatile_info info; > > + struct dentry *d_volatile, *d_dirty; > > + > > + d_volatile = lookup_one_len(p->name, path->dentry, p->len); > > + if (IS_ERR(d_volatile)) > > + return PTR_ERR(d_volatile); > > + > > + inode_lock_nested(d_volatile->d_inode, I_MUTEX_PARENT); > > You can't do this. I_MUTEX_PARENT level is already taken on parent > and you also don't need to perform lookup in this helper. I will explain below. > > > + d_dirty = lookup_one_len(OVL_VOLATILE_DIRTY_NAME, d_volatile, > > + strlen(OVL_VOLATILE_DIRTY_NAME)); > > + if (IS_ERR(d_dirty)) { > > + err = PTR_ERR(d_dirty); > > + if (err != -ENOENT) > > + ret = err; > > + goto out_putvolatile; > > + } > > + > > + if (!d_dirty->d_inode) > > + goto out_putdirty; > > + > > + inode_lock_nested(d_dirty->d_inode, I_MUTEX_XATTR); > > What's this lock for? > I need to take a lock on this inode to prevent modifications to it, right, or is getting the xattr safe? > > + err = ovl_do_getxattr(d_dirty, OVL_XATTR_VOLATILE, &info, sizeof(info)); > > + inode_unlock(d_dirty->d_inode); > > + if (err != sizeof(info)) > > + goto out_putdirty; > > + > > + if (!uuid_equal(&overlay_boot_id, &info.overlay_boot_id)) { > > + pr_debug("boot id has changed (reboot or module reloaded)\n"); > > + goto out_putdirty; > > + } > > + > > + if (d_dirty->d_inode->i_sb->s_instance_id != info.s_instance_id) { > > + pr_debug("workdir has been unmounted and remounted\n"); > > + goto out_putdirty; > > + } > > + > > + err = errseq_check(&d_dirty->d_inode->i_sb->s_wb_err, info.errseq); > > + if (err) { > > + pr_debug("workdir dir has experienced errors: %d\n", err); > > + goto out_putdirty; > > + } > > Please put all the above including getxattr in helper ovl_verify_volatile_info() > Is it okay if the helper stays in super.c? > > + > > + /* Dirty file is okay, delete it. */ > > + ret = ovl_do_unlink(d_volatile->d_inode, d_dirty); > > That's a problem. By doing this, you have now approved a regular overlay > re-mount, but you need only approve a volatile overlay re-mount. > Need to pass ofs to ovl_workdir_cleanup{,_recurse}. > I can add a check to make sure this behaviour is only allowed on remounts back into volatile. There's technically a race condition here, where if there is an error between this check, and the mounting being finished, the FS could be dirty, but that already exists with the impl today. > > + > > +out_putdirty: > > + dput(d_dirty); > > +out_putvolatile: > > + inode_unlock(d_volatile->d_inode); > > + dput(d_volatile); > > + return ret; > > +} > > + > > +/* > > + * check_incompat checks this specific incompat entry for incompatibility. > > + * If it is found to be incompatible -EINVAL will be returned. > > + * > > + * Any other -errno indicates an unknown error, and filesystem mounting > > + * should be aborted. > > + */ > > +static int ovl_check_incompat(struct ovl_cache_entry *p, struct path *path) > > +{ > > + int err = -EINVAL; > > + > > + if (!strcmp(p->name, OVL_VOLATILEDIR_NAME)) > > + err = ovl_check_incompat_volatile(p, path); > > + > > + if (err == -EINVAL) > > + pr_err("incompat feature '%s' cannot be mounted\n", p->name); > > + else > > + pr_debug("incompat '%s' handled: %d\n", p->name, err); > > + > > + return err; > > +} > > > > static int ovl_workdir_cleanup_recurse(struct path *path, int level) > > { > > @@ -1098,10 +1175,9 @@ static int ovl_workdir_cleanup_recurse(struct path *path, int level) > > if (p->len == 2 && p->name[1] == '.') > > continue; > > } else if (incompat) { > > - pr_err("overlay with incompat feature '%s' cannot be mounted\n", > > - p->name); > > - err = -EINVAL; > > - break; > > + err = ovl_check_incompat(p, path); > > + if (err) > > + break; > > The call to ovl_check_incompat here is too soon and it makes > you need to lookup both the volatile dir and dirty file. > What you need to do and let cleanup recurse into the next level > while letting it know that we are now traversing the "incompat" > subtree. > Maybe a dumb question but why is it incompat/volatile/dirty, rather than just incompat/volatile, where volatile is a file? Are there any caveats with putting the xattr on the directory, or alternatively are there any reasons not to make the structure incompat/volatile/dirty? > You can see a more generic implementation I once made here: > https://github.com/amir73il/linux/blob/ovl-features/fs/overlayfs/readdir.c#L1051 > but it should be simpler with just incompat/volatile. > Perhaps something like this: > > dentry = lookup_one_len(p->name, path->dentry, p->len); > if (IS_ERR(dentry)) > continue; > - if (dentry->d_inode) > + if (dentry->d_inode && d_is_dir(dentry) && incompat) > + err = ovl_incompatdir_cleanup(dir, path->mnt, dentry); > + else if (dentry->d_inode) > err = ovl_workdir_cleanup(dir, path->mnt, > dentry, level); > dput(dentry); > > Then inside ovl_incompatdir_cleanup() you can call ovl_check_incompat() > with dentry argument. > > Now you have a few options. A simple option would be to put the volatile > xattr on the volatile dir instead of on the dirty file. > If you do that, you can call ovl_verify_volatile_info() on the volatile dentry > without any lookups (only on a volatile re-mount) and if the volatile dir is > approved for reuse, you don't even need to remove the dirty file, because > it's just going to be re-created anyway. > > > > } > > dentry = lookup_one_len(p->name, path->dentry, p->len); > > if (IS_ERR(dentry)) > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > index 2ee0ba16cc7b..94980898009f 100644 > > --- a/fs/overlayfs/super.c > > +++ b/fs/overlayfs/super.c > > @@ -15,6 +15,7 @@ > > #include <linux/seq_file.h> > > #include <linux/posix_acl_xattr.h> > > #include <linux/exportfs.h> > > +#include <linux/uuid.h> > > #include "overlayfs.h" > > > > MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>"); > > @@ -23,6 +24,7 @@ MODULE_LICENSE("GPL"); > > > > > > struct ovl_dir_cache; > > +uuid_t overlay_boot_id; > > ovl_boot_id please. > > > > > #define OVL_MAX_STACK 500 > > > > @@ -1246,20 +1248,35 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent, > > */ > > static int ovl_create_volatile_dirty(struct ovl_fs *ofs) > > { > > + int err; > > unsigned int ctr; > > struct dentry *d = dget(ofs->workbasedir); > > static const char *const volatile_path[] = { > > - OVL_WORKDIR_NAME, "incompat", "volatile", "dirty" > > + OVL_WORKDIR_NAME, > > + OVL_INCOMPATDIR_NAME, > > + OVL_VOLATILEDIR_NAME, > > + OVL_VOLATILE_DIRTY_NAME, > > }; > > const char *const *name = volatile_path; > > + struct overlay_volatile_info info = {}; > > > > for (ctr = ARRAY_SIZE(volatile_path); ctr; ctr--, name++) { > > d = ovl_lookup_or_create(d, *name, ctr > 1 ? S_IFDIR : S_IFREG); > > if (IS_ERR(d)) > > return PTR_ERR(d); > > } > > + > > + uuid_copy(&info.overlay_boot_id, &overlay_boot_id); > > + info.s_instance_id = d->d_inode->i_sb->s_instance_id; > > + info.errseq = errseq_sample(&d->d_inode->i_sb->s_wb_err); > > + > > + > > + err = ovl_do_setxattr(d, OVL_XATTR_VOLATILE, &info, sizeof(info), 0); > > + if (err == -EOPNOTSUPP) > > + err = 0; > > + > > Please put all the above in helper ovl_set_volatile_info() > > Thanks, > Amir. Thank you for the fast review.