Re: [PATCH v3 3/3] ovl: change layer mount option handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 13, 2023 at 5:49 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> We ran into issues where mount(8) passed multiple lower layers as one
> big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
> option is limited to 256 bytes in strndup_user(). While this would be
> fixable by extending the fsconfig() buffer I'd rather encourage users to
> append layers via multiple fsconfig() calls as the interface allows
> nicely for this. This has also been requested as a feature before.
>
> With this port to the new mount api the following will be possible:
>
>         fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);
>
>         /* set upper layer */
>         fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);
>
>         /* append "/lower2", "/lower3", and "/lower4" */
>         fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);
>
>         /* turn index feature on */
>         fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);
>
>         /* append "/lower5" */
>         fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);
>
> Specifying ':' would have been rejected so this isn't a regression. And
> we can't simply use "lowerdir=/lower" to append on top of existing
> layers as "lowerdir=/lower,lowerdir=/other-lower" would make
> "/other-lower" the only lower layer so we'd break uapi if we changed
> this. So the ':' prefix seems a good compromise.
>
> Users can choose to specify multiple layers at once or individual
> layers. A layer is appended if it starts with ":". This requires that
> the user has already added at least one layer before. If lowerdir is
> specified again without a leading ":" then all previous layers are
> dropped and replaced with the new layers. If lowerdir is specified and
> empty than all layers are simply dropped.
>
> An additional change is that overlayfs will now parse and resolve layers
> right when they are specified in fsconfig() instead of deferring until
> super block creation. This allows users to receive early errors.
>
> It also allows users to actually use up to 500 layers something which
> was theoretically possible but ended up not working due to the mount
> option string passed via mount(2) being too large.
>
> This also allows a more privileged process to set config options for a
> lesser privileged process as the creds for fsconfig() and the creds for
> fsopen() can differ. We could restrict that they match by enforcing that
> the creds of fsopen() and fsconfig() match but I don't see why that
> needs to be the case and allows for a good delegation mechanism.
>
> Plus, in the future it means we're able to extend overlayfs mount
> options and allow users to specify layers via file descriptors instead
> of paths:
>
>         fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);
>
>         /* append */
>         fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);
>
>         /* append */
>         fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);
>
>         /* clear all layers specified until now */
>         fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);
>
> This would be especially nice if users create an overlayfs mount on top
> of idmapped layers or just in general private mounts created via
> open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
> anywhere in the filesystem. But for now just do the minimal thing.
>
> We should probably aim to move more validation into ovl_fs_parse_param()
> so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
> be done in additional patches later.
>
> This is now also rebased on top of the lazy lowerdata lookup which
> allows the specificatin of data only layers using the new "::" syntax.
>
> The rules are simple. A data only layers cannot be followed by any
> regular layers and data layers must be preceeded by at least one regular
> layer.
>
> Parsing the lowerdir mount option must change because of this. The
> original patchset used the old lowerdir parsing function to split a
> lowerdir mount option string such as:
>
>         lowerdir=/lower1:/lower2::/lower3::/lower4
>
> simply replacing each non-escaped ":" by "\0". So sequences of
> non-escaped ":" were counted as layers. For example, the previous
> lowerdir mount option above would've counted 6 layers instead of 4 and a
> lowerdir mount option such as:
>
>         lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"
>
> would be counted as 33 layers. Other than being ugly this didn't matter
> much because kern_path() would reject the first "\0" layer. However,
> this overcounting of layers becomes problematic when we base allocations
> on it where we very much only want to allocate space for 4 layers
> instead of 33.
>
> So the new parsing function rejects non-escaped sequences of colons
> other than ":" and "::" immediately instead of relying on kern_path().
>
> Link: https://github.com/util-linux/util-linux/issues/2287
> Link: https://github.com/util-linux/util-linux/issues/1992
> Link: https://bugs.archlinux.org/task/78702
> Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> ---
>  fs/overlayfs/Makefile    |   2 +-
>  fs/overlayfs/overlayfs.h |  23 +++
>  fs/overlayfs/ovl_entry.h |   3 +-
>  fs/overlayfs/params.c    | 388 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/overlayfs/super.c     | 376 +++++++++++++++------------------------------
>  5 files changed, 534 insertions(+), 258 deletions(-)
>
> diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
> index 9164c585eb2f..4e173d56b11f 100644
> --- a/fs/overlayfs/Makefile
> +++ b/fs/overlayfs/Makefile
> @@ -6,4 +6,4 @@
>  obj-$(CONFIG_OVERLAY_FS) += overlay.o
>
>  overlay-objs := super.o namei.o util.o inode.o file.o dir.o readdir.o \
> -               copy_up.o export.o
> +               copy_up.o export.o params.o
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index fcac4e2c56ab..7659ea6e02cb 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -119,6 +119,29 @@ struct ovl_fh {
>  #define OVL_FH_FID_OFFSET      (OVL_FH_WIRE_OFFSET + \
>                                  offsetof(struct ovl_fb, fid))
>
> +/* params.c */
> +#define OVL_MAX_STACK 500
> +
> +struct ovl_fs_context_layer {
> +       char *name;
> +       struct path path;
> +};
> +
> +struct ovl_fs_context {
> +       struct path upper;
> +       struct path work;
> +       size_t capacity;
> +       size_t nr; /* includes nr_data */
> +       size_t nr_data;
> +       u8 set;
> +       struct ovl_fs_context_layer *lower;
> +};
> +
> +int ovl_parse_param_upperdir(const char *name, struct fs_context *fc,
> +                            bool workdir);
> +int ovl_parse_param_lowerdir(const char *name, struct fs_context *fc);
> +void ovl_parse_param_drop_lowerdir(struct ovl_fs_context *ctx);
> +
>  extern const char *const ovl_xattr_table[][2];
>  static inline const char *ovl_xattr(struct ovl_fs *ofs, enum ovl_xattr ox)
>  {
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index c72433c06006..7888ab33730b 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -6,7 +6,6 @@
>   */
>
>  struct ovl_config {
> -       char *lowerdir;
>         char *upperdir;
>         char *workdir;
>         bool default_permissions;
> @@ -41,6 +40,7 @@ struct ovl_layer {
>         int idx;
>         /* One fsid per unique underlying sb (upper fsid == 0) */
>         int fsid;
> +       char *name;
>  };
>
>  /*
> @@ -101,7 +101,6 @@ struct ovl_fs {
>         errseq_t errseq;
>  };
>
> -
>  /* Number of lower layers, not including data-only layers */
>  static inline unsigned int ovl_numlowerlayer(struct ovl_fs *ofs)
>  {
> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
> new file mode 100644
> index 000000000000..a1606af1613f
> --- /dev/null
> +++ b/fs/overlayfs/params.c
> @@ -0,0 +1,388 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/fs.h>
> +#include <linux/namei.h>
> +#include <linux/fs_context.h>
> +#include <linux/fs_parser.h>
> +#include <linux/posix_acl_xattr.h>
> +#include <linux/xattr.h>
> +#include "overlayfs.h"
> +
> +static ssize_t ovl_parse_param_split_lowerdirs(char *str)
> +{
> +       ssize_t nr_layers = 1, nr_colons = 0;
> +       char *s, *d;
> +
> +       for (s = d = str;; s++, d++) {
> +               if (*s == '\\') {
> +                       s++;
> +               } else if (*s == ':') {
> +                       bool next_colon = (*(s + 1) == ':');
> +
> +                       nr_colons++;
> +                       if (nr_colons == 2 && next_colon) {
> +                               pr_err("only single ':' or double '::' sequences of unescaped colons in lowerdir mount option allowed.\n");
> +                               return -EINVAL;
> +                       }
> +                       /* count layers, not colons */
> +                       if (!next_colon)
> +                               nr_layers++;
> +
> +                       *d = '\0';
> +                       continue;
> +               }
> +
> +               *d = *s;
> +               if (!*s) {
> +                       /* trailing colons */
> +                       if (nr_colons) {
> +                               pr_err("unescaped trailing colons in lowerdir mount option.\n");
> +                               return -EINVAL;
> +                       }
> +                       break;
> +               }
> +               nr_colons = 0;
> +       }
> +
> +       return nr_layers;
> +}
> +
> +static int ovl_mount_dir_noesc(const char *name, struct path *path)
> +{
> +       int err = -EINVAL;
> +
> +       if (!*name) {
> +               pr_err("empty lowerdir\n");
> +               goto out;
> +       }
> +       err = kern_path(name, LOOKUP_FOLLOW, path);
> +       if (err) {
> +               pr_err("failed to resolve '%s': %i\n", name, err);
> +               goto out;
> +       }
> +       err = -EINVAL;
> +       if (ovl_dentry_weird(path->dentry)) {
> +               pr_err("filesystem on '%s' not supported\n", name);
> +               goto out_put;
> +       }
> +       if (!d_is_dir(path->dentry)) {
> +               pr_err("'%s' not a directory\n", name);
> +               goto out_put;
> +       }
> +       return 0;
> +
> +out_put:
> +       path_put_init(path);
> +out:
> +       return err;
> +}
> +
> +static void ovl_unescape(char *s)
> +{
> +       char *d = s;
> +
> +       for (;; s++, d++) {
> +               if (*s == '\\')
> +                       s++;
> +               *d = *s;
> +               if (!*s)
> +                       break;
> +       }
> +}
> +
> +static int ovl_mount_dir(const char *name, struct path *path)
> +{
> +       int err = -ENOMEM;
> +       char *tmp = kstrdup(name, GFP_KERNEL);
> +
> +       if (tmp) {
> +               ovl_unescape(tmp);
> +               err = ovl_mount_dir_noesc(tmp, path);
> +
> +               if (!err && path->dentry->d_flags & DCACHE_OP_REAL) {
> +                       pr_err("filesystem on '%s' not supported as upperdir\n",
> +                              tmp);
> +                       path_put_init(path);
> +                       err = -EINVAL;
> +               }
> +               kfree(tmp);
> +       }
> +       return err;
> +}
> +
> +int ovl_parse_param_upperdir(const char *name, struct fs_context *fc,
> +                            bool workdir)
> +{
> +       int err;
> +       struct ovl_fs *ofs = fc->s_fs_info;
> +       struct ovl_config *config = &ofs->config;
> +       struct ovl_fs_context *ctx = fc->fs_private;
> +       struct path path;
> +       char *dup;
> +
> +       err = ovl_mount_dir(name, &path);
> +       if (err)
> +               return err;
> +
> +       /*
> +        * Check whether upper path is read-only here to report failures
> +        * early. Don't forget to recheck when the superblock is created
> +        * as the mount attributes could change.
> +        */
> +       if (__mnt_is_readonly(path.mnt)) {
> +               path_put(&path);
> +               return -EINVAL;
> +       }
> +
> +       dup = kstrdup(name, GFP_KERNEL);
> +       if (!dup) {
> +               path_put(&path);
> +               return -ENOMEM;
> +       }
> +
> +       if (workdir) {
> +               kfree(config->workdir);
> +               config->workdir = dup;
> +               path_put(&ctx->work);
> +               ctx->work = path;
> +       } else {
> +               kfree(config->upperdir);
> +               config->upperdir = dup;
> +               path_put(&ctx->upper);
> +               ctx->upper = path;
> +       }
> +       return 0;
> +}
> +
> +void ovl_parse_param_drop_lowerdir(struct ovl_fs_context *ctx)
> +{
> +       for (size_t nr = 0; nr < ctx->nr; nr++) {
> +               path_put(&ctx->lower[nr].path);
> +               kfree(ctx->lower[nr].name);
> +               ctx->lower[nr].name = NULL;
> +       }
> +       ctx->nr = 0;
> +       ctx->nr_data = 0;
> +}
> +
> +/*
> + * Parse lowerdir= mount option:
> + *
> + * (1) lowerdir=/lower1:/lower2:/lower3::/data1::/data2
> + *     Set "/lower1", "/lower2", and "/lower3" as lower layers and
> + *     "/data1" and "/data2" as data lower layers. Any existing lower
> + *     layers are replaced.
> + * (2) lowerdir=:/lower4
> + *     Append "/lower4" to current stack of lower layers. This requires
> + *     that there already is at least one lower layer configured.
> + * (3) lowerdir=::/lower5
> + *     Append data "/lower5" as data lower layer. This requires that
> + *     there's at least one regular lower layer present.
> + */
> +int ovl_parse_param_lowerdir(const char *name, struct fs_context *fc)
> +{
> +       int err;
> +       struct ovl_fs_context *ctx = fc->fs_private;
> +       struct ovl_fs_context_layer *l;
> +       char *dup = NULL, *dup_iter;
> +       ssize_t nr_lower = 0, nr = 0, nr_data = 0;
> +       bool append = false, data_layer = false;
> +
> +       /*
> +        * Ensure we're backwards compatible with mount(2)
> +        * by allowing relative paths.
> +        */
> +
> +       /* drop all existing lower layers */
> +       if (!*name) {
> +               ovl_parse_param_drop_lowerdir(ctx);
> +               return 0;
> +       }
> +
> +       if (strncmp(name, "::", 2) == 0) {
> +               /*
> +                * This is a data layer.
> +                * There must be at least one regular lower layer
> +                * specified.
> +                */
> +               if (ctx->nr == 0) {
> +                       pr_err("data lower layers without regular lower layers not allowed");
> +                       return -EINVAL;
> +               }
> +
> +               /* Skip the leading "::". */
> +               name += 2;
> +               data_layer = true;
> +               /*
> +                * A data layer is automatically an append as there
> +                * must've been at least one regular lower layer.
> +                */
> +               append = true;
> +       } else if (*name == ':') {
> +               /*
> +                * This is a regular lower layer.
> +                * If users want to append a layer enforce that they
> +                * have already specified a first layer before. It's
> +                * better to be strict.
> +                */
> +               if (ctx->nr == 0) {
> +                       pr_err("cannot append layer if no previous layer has been specified");
> +                       return -EINVAL;
> +               }
> +
> +               /*
> +                * Once a sequence of data layers has started regular
> +                * lower layers are forbidden.
> +                */
> +               if (ctx->nr_data > 0) {
> +                       pr_err("regular lower layers cannot follow data lower layers");
> +                       return -EINVAL;
> +               }
> +
> +               /* Skip the leading ":". */
> +               name++;
> +               append = true;
> +       }
> +
> +       dup = kstrdup(name, GFP_KERNEL);
> +       if (!dup)
> +               return -ENOMEM;
> +
> +       err = -EINVAL;
> +       nr_lower = ovl_parse_param_split_lowerdirs(dup);
> +       if (nr_lower < 0)
> +               goto out_err;
> +
> +       if ((nr_lower > OVL_MAX_STACK) ||
> +           (append && (size_add(ctx->nr, nr_lower) > OVL_MAX_STACK))) {
> +               pr_err("too many lower directories, limit is %d\n", OVL_MAX_STACK);
> +               goto out_err;
> +       }
> +
> +       if (!append)
> +               ovl_parse_param_drop_lowerdir(ctx);
> +
> +       /*
> +        * (1) append
> +        *
> +        * We want nr <= nr_lower <= capacity We know nr > 0 and nr <=
> +        * capacity. If nr == 0 this wouldn't be append. If nr +
> +        * nr_lower is <= capacity then nr <= nr_lower <= capacity
> +        * already holds. If nr + nr_lower exceeds capacity, we realloc.
> +        *
> +        * (2) replace
> +        *
> +        * Ensure we're backwards compatible with mount(2) which allows
> +        * "lowerdir=/a:/b:/c,lowerdir=/d:/e:/f" causing the last
> +        * specified lowerdir mount option to win.
> +        *
> +        * We want nr <= nr_lower <= capacity We know either (i) nr == 0
> +        * or (ii) nr > 0. We also know nr_lower > 0. The capacity
> +        * could've been changed multiple times already so we only know
> +        * nr <= capacity. If nr + nr_lower > capacity we realloc,
> +        * otherwise nr <= nr_lower <= capacity holds already.
> +        */
> +       nr_lower += ctx->nr;
> +       if (nr_lower > ctx->capacity) {
> +               err = -ENOMEM;
> +               l = krealloc_array(ctx->lower, nr_lower, sizeof(*ctx->lower),
> +                                  GFP_KERNEL_ACCOUNT);
> +               if (!l)
> +                       goto out_err;
> +
> +               ctx->lower = l;
> +               ctx->capacity = nr_lower;
> +       }
> +
> +       /*
> +        *   (3) By (1) and (2) we know nr <= nr_lower <= capacity.
> +        *   (4) If ctx->nr == 0 => replace
> +        *       We have verified above that the lowerdir mount option
> +        *       isn't an append, i.e., the lowerdir mount option
> +        *       doesn't start with ":" or "::".
> +        * (4.1) The lowerdir mount options only contains regular lower
> +        *       layers ":".
> +        *       => Nothing to verify.
> +        * (4.2) The lowerdir mount options contains regular ":" and
> +        *       data "::" layers.
> +        *       => We need to verify that data lower layers "::" aren't
> +        *          followed by regular ":" lower layers
> +        *   (5) If ctx->nr > 0 => append
> +        *       We know that there's at least one regular layer
> +        *       otherwise we would've failed when parsing the previous
> +        *       lowerdir mount option.
> +        * (5.1) The lowerdir mount option is a regular layer ":" append
> +        *       => We need to verify that no data layers have been
> +        *          specified before.
> +        * (5.2) The lowerdir mount option is a data layer "::" append
> +        *       We know that there's at least one regular layer or
> +        *       other data layers. => There's nothing to verify.
> +        */
> +       dup_iter = dup;
> +       for (nr = ctx->nr; nr < nr_lower; nr++) {
> +               l = &ctx->lower[nr];

missing here:
                   memset(l, 0, sizeof(*l));

otherwise, when trying to mount ovl over illegal fs (vfat)...

> +
> +               err = ovl_mount_dir_noesc(dup_iter, &l->path);
> +               if (err)
> +                       goto out_put;
> +
> +               err = -ENOMEM;
> +               l->name = kstrdup(dup_iter, GFP_KERNEL_ACCOUNT);
> +               if (!l->name)
> +                       goto out_put;
> +
> +               if (data_layer)
> +                       nr_data++;
> +
> +               /* Calling strchr() again would overrun. */
> +               if ((nr + 1) == nr_lower)
> +                       break;
> +
> +               err = -EINVAL;
> +               dup_iter = strchr(dup_iter, '\0') + 1;
> +               if (*dup_iter) {
> +                       /*
> +                        * This is a regular layer so we require that
> +                        * there are no data layers.
> +                        */
> +                       if ((ctx->nr_data + nr_data) > 0) {
> +                               pr_err("regular lower layers cannot follow data lower layers");
> +                               goto out_put;
> +                       }
> +
> +                       data_layer = false;
> +                       continue;
> +               }
> +
> +               /* This is a data lower layer. */
> +               data_layer = true;
> +               dup_iter++;
> +       }
> +       ctx->nr = nr_lower;
> +       ctx->nr_data += nr_data;
> +       kfree(dup);
> +       return 0;
> +
> +out_put:
> +       /*
> +        * We know nr >= ctx->nr < nr_lower. If we failed somewhere
> +        * we want to undo until nr == ctx->nr. This is correct for
> +        * both ctx->nr == 0 and ctx->nr > 0.
> +        */
> +       for (; nr >= ctx->nr; nr--) {
> +               l = &ctx->lower[nr];
> +               kfree(l->name);
> +               l->name = NULL;
> +               path_put(&l->path);
> +

...this is kfreeing a garbage pointer.

I will fold that into my overlayfs-next branch.

Thanks,
Amir.




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux