On Thu, Jun 08, 2023 at 09:29:39PM +0300, Amir Goldstein wrote: > On Thu, Jun 8, 2023 at 7:07 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > We recently ported util-linux to the new mount api. Now the mount(8) > > tool will by default use the new mount api. While trying hard to fall > > back to the old mount api gracefully there are still cases where we run > > into issues that are difficult to handle nicely. > > > > Now with mount(8) and libmount supporting the new mount api I expect an > > increase in the number of bug reports and issues we're going to see with > > filesystems that don't yet support the new mount api. So it's time we > > rectify this. > > > > For overlayfs specifically we ran into issues where mount(8) passed > > multiple lower layers as one big string through fsconfig(). But the > > fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in > > strndup_user(). While this would be fixable by extending the fsconfig() > > buffer I'd rather encourage users to append layers via multiple > > fsconfig() calls as the interface allows nicely for this. This has also > > been requested as a feature before. > > > > With this port to the new mount api the following will be possible: > > > > fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0); > > > > /* set upper layer */ > > fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0); > > > > /* append "/lower2", "/lower3", and "/lower4" */ > > fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0); > > > > /* turn index feature on */ > > fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0); > > > > /* append "/lower5" */ > > fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0); > > > > Specifying ':' would have been rejected so this isn't a regression. And > > we can't simply use "lowerdir=/lower" to append on top of existing > > layers as "lowerdir=/lower,lowerdir=/other-lower" would make > > "/other-lower" the only lower layer so we'd break uapi if we changed > > this. So the ':' prefix seems a good compromise. > > > > Users can choose to specify multiple layers at once or individual > > layers. A layer is appended if it starts with ":". This requires that > > the user has already added at least one layer before. If lowerdir is > > specified again without a leading ":" then all previous layers are > > dropped and replaced with the new layers. If lowerdir is specified and > > empty than all layers are simply dropped. > > > > An additional change is that overlayfs will now parse and resolve layers > > right when they are specified in fsconfig() instead of deferring until > > super block creation. This allows users to receive early errors. > > > > It also allows users to actually use up to 500 layers something which > > was theoretically possible but ended up not working due to the mount > > option string passed via mount(2) being too large. > > > > This also allows a more privileged process to set config options for a > > lesser privileged process as the creds for fsconfig() and the creds for > > fsopen() can differ. We could restrict that they match by enforcing that > > the creds of fsopen() and fsconfig() match but I don't see why that > > needs to be the case and allows for a good delegation mechanism. > > > > Plus, in the future it means we're able to extend overlayfs mount > > options and allow users to specify layers via file descriptors instead > > of paths: > > > > fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd); > > > > /* append */ > > fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd); > > > > /* append */ > > fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd); > > > > /* clear all layers specified until now */ > > fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0); > > > > This would be especially nice if users create an overlayfs mount on top > > of idmapped layers or just in general private mounts created via > > open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear > > anywhere in the filesystem. But for now just do the minimal thing. > > > > We should probably aim to move more validation into ovl_fs_parse_param() > > so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can > > be done in additional patches later. > > > > Link: https://github.com/util-linux/util-linux/issues/2287 [1] > > Link: https://github.com/util-linux/util-linux/issues/1992 [2] > > Link: https://bugs.archlinux.org/task/78702 [3] > > Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4] > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > --- > > > > --- > > > > I'm starting to get the feeling that I stared enough at this and I would > > need a fresh set of eyes to review it for any bugs. Plus, Amir seems to > > have conflicting series and I would have to rebase anyway so no point in > > delaying this any further. > > --- > > fs/overlayfs/super.c | 896 ++++++++++++++++++++++++++++++++------------------- > > 1 file changed, 568 insertions(+), 328 deletions(-) > > > > Very big patch - Not so easy to review. > It feels like a large refactoring mixed with the api change. > Can it easily be split to just the refactoring patch > and the api change patch? or any other split that will be > easier to review. I don't really think so because you can't do a piecemeal conversion unfortunately. But if you have concrete ideas I'm happy to hear them. > > All looking quite sane, I only noticed one glitch below... > > Thanks, > Amir. > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > index f97ad8b40dbb..db22cbde1487 100644 > > --- a/fs/overlayfs/super.c > > +++ b/fs/overlayfs/super.c > > @@ -16,6 +16,8 @@ > > #include <linux/posix_acl_xattr.h> > > #include <linux/exportfs.h> > > #include <linux/file.h> > > +#include <linux/fs_context.h> > > +#include <linux/fs_parser.h> > > #include "overlayfs.h" > > > > MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>"); > > @@ -67,6 +69,76 @@ module_param_named(metacopy, ovl_metacopy_def, bool, 0644); > > MODULE_PARM_DESC(metacopy, > > "Default to on or off for the metadata only copy up feature"); > > > > +enum { > > + Opt_lowerdir, > > + Opt_upperdir, > > + Opt_workdir, > > + Opt_default_permissions, > > + Opt_redirect_dir, > > + Opt_index, > > + Opt_uuid, > > + Opt_nfs_export, > > + Opt_userxattr, > > + Opt_xino, > > + Opt_metacopy, > > + Opt_volatile, > > +}; > > + > > +static const struct constant_table ovl_parameter_bool[] = { > > + { "on", true }, > > + { "off", false }, > > + {} > > +}; > > + > > +static const struct constant_table ovl_parameter_xino[] = { > > + { "on", OVL_XINO_ON }, > > + { "off", OVL_XINO_OFF }, > > + { "auto", OVL_XINO_AUTO }, > > + {} > > +}; > > + > > +#define fsparam_string_empty(NAME, OPT) \ > > + __fsparam(fs_param_is_string, NAME, OPT, fs_param_can_be_empty, NULL) > > + > > +static const struct fs_parameter_spec ovl_parameter_spec[] = { > > + fsparam_string_empty("lowerdir", Opt_lowerdir), > > + fsparam_string_empty("upperdir", Opt_upperdir), > > + fsparam_string_empty("workdir", Opt_workdir), > > + fsparam_flag("default_permissions", Opt_default_permissions), > > + fsparam_enum("redirect_dir", Opt_redirect_dir, ovl_parameter_xino), > > ovl_parameter_xino out of place. That's rather odd. Thanks for spotting this...