On Thu, Sep 24, 2020 at 11:40 AM Pavel Tikhomirov <ptikhomirov@xxxxxxxxxxxxx> wrote: > > > > On 9/23/20 7:09 PM, Amir Goldstein wrote: > > On Wed, Sep 23, 2020 at 6:23 PM Pavel Tikhomirov > > <ptikhomirov@xxxxxxxxxxxxx> wrote: > >> > >> This relaxes uuid checks for overlay index feature. It is only possible > >> in case there is only one filesystem for all the work/upper/lower > >> directories and bare file handles from this backing filesystem are uniq. > >> In case we have multiple filesystems here just fall back to normal > >> "index=on". > >> > >> This is needed when overlayfs is/was mounted in a container with > >> index enabled (e.g.: to be able to resolve inotify watch file handles on > >> it to paths in CRIU), and this container is copied and started alongside > >> with the original one. This way the "copy" container can't have the same > >> uuid on the superblock and mounting the overlayfs from it later would > >> fail. > >> > >> That is an example of the problem on top of loop+ext4: > >> > >> dd if=/dev/zero of=loopbackfile.img bs=100M count=10 > >> losetup -fP loopbackfile.img > >> losetup -a > >> #/dev/loop0: [64768]:35 (/loop-test/loopbackfile.img) > >> mkfs.ext4 loopbackfile.img > >> mkdir loop-mp > >> mount -o loop /dev/loop0 loop-mp > >> mkdir loop-mp/{lower,upper,work,merged} > >> mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\ > >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged > >> umount loop-mp/merged > >> umount loop-mp > >> e2fsck -f /dev/loop0 > >> tune2fs -U random /dev/loop0 > >> > >> mount -o loop /dev/loop0 loop-mp > >> mount -t overlay overlay -oindex=on,lowerdir=loop-mp/lower,\ > >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged > >> #mount: /loop-test/loop-mp/merged: > >> #mount(2) system call failed: Stale file handle. > >> > >> mount -t overlay overlay -oindex=nouuid,lowerdir=loop-mp/lower,\ > >> upperdir=loop-mp/upper,workdir=loop-mp/work loop-mp/merged > >> > >> If you just change the uuid of the backing filesystem, overlay is not > >> mounting any more. In Virtuozzo we copy container disks (ploops) when > >> crate the copy of container and we require fs uuid to be uniq for a new > >> container. > >> > >> v2: in v1 I missed actual uuid check skip - add it > >> > >> CC: Amir Goldstein <amir73il@xxxxxxxxx> > >> CC: Vivek Goyal <vgoyal@xxxxxxxxxx> > >> CC: Miklos Szeredi <miklos@xxxxxxxxxx> > >> CC: linux-unionfs@xxxxxxxxxxxxxxx > >> CC: linux-kernel@xxxxxxxxxxxxxxx > >> Signed-off-by: Pavel Tikhomirov <ptikhomirov@xxxxxxxxxxxxx> > >> --- > > > > Look reasonable, but you need to rebase over > > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next > > > > Which makes a lot of your work unneeded. > > ofs is propagated to most of the relevant helpers > > and you should propagate it down to ovl_decode_real_fh(). > > Thanks! Will do. > > > > > Some minor comments below... > > > >> fs/overlayfs/Kconfig | 16 +++++++++++ > >> fs/overlayfs/export.c | 6 ++-- > >> fs/overlayfs/namei.c | 35 +++++++++++++++-------- > >> fs/overlayfs/overlayfs.h | 23 +++++++++++---- > >> fs/overlayfs/ovl_entry.h | 2 +- > >> fs/overlayfs/super.c | 61 +++++++++++++++++++++++++++++----------- > >> 6 files changed, 106 insertions(+), 37 deletions(-) > >> > >> diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig > >> index dd188c7996b3..b00fd44006f9 100644 > >> --- a/fs/overlayfs/Kconfig > >> +++ b/fs/overlayfs/Kconfig > >> @@ -61,6 +61,22 @@ config OVERLAY_FS_INDEX > >> > >> If unsure, say N. > >> > >> +config OVERLAY_FS_INDEX_NOUUID > >> + bool "Overlayfs: relax uuid checks of inodes index feature" > >> + depends on OVERLAY_FS > >> + depends on OVERLAY_FS_INDEX > >> + help > >> + If this config option is enabled then overlay will skip uuid checks > >> + for index lower to upper inode map, this only can be done if all > >> + upper and lower directories are on the same filesystem where basic > >> + fhandles are uniq. > >> + > >> + It is needed to overcome possible change of uuid on superblock of the > >> + backing filesystem, e.g. when you copied the virtual disk and mount > >> + both the copy of the disk and the original one at the same time. > >> + > >> + If unsure, say N. > >> + > > > > Please do not add a config option for this. > > Isn't a mount option sufficient for your needs? > > Users inside Virtuozzo container can mount overlayfs inside the CT (we > assume that they do "regular" mounts without any "index=" option as > docker does) so we wan't to setup the default in kernel config, so that > all "regular" mounts of the user become "index=nouuid" automatically, > and thus we would be able to both migrate (CRIU inotify resolution by > fhandle on dump) and copy (copy disk with uuid change) the container > without problem. > That is a problem. So far, mount options always override module and kernel config options. So if mount option says inodex=on explicitly, it should be index=on. The only way I see for you to tackle this is if nouuid option is independent of index option, e.g. index=on,anon_uuid (or uuid=off). But if kernel config has a way to turn this on, then module param and mount option should have a way to turn it off. [...] > >> @@ -1889,9 +1911,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > >> if (err) > >> goto out_free_oe; > >> > >> + if (ofs->config.index == OVL_INDEX_NOUUID && ofs->numfs > 1) { > >> + pr_warn("The index=nouuid requires a single fs for lower and upper, falling back to index=on.\n"); > >> + ofs->config.index = OVL_INDEX_ON; > >> + } > >> + > > > > It's too late for this fallback now, you already did relaxed ovl_verify_origin() > > and now we will continue as if all is ok. > > Please fail the mount in this case. > > > > I don't think that users that specifically requested index=nouuid would care to > > fallback to index=on. > > No, it's we who will force users to switch to index=nouuid in our > Virtuozzo case through kernel config default, so probably having a > fallback is a good thing, as users will be able to use their overlay at > least until we "copy" their container. > > Maybe I can just move my check before ovl_get_indexdir (as far as I can > see it is the only place from ovl_fill_super where we reach > ovl_verify_fh or ovl_decode_real_fh) and after ovl_get_lowerstack where > numfs is calculated. - Will do. Yes, that would be fine techicaly. I am not sure if the fallback is helpful though. Anyway, you would need to get approval from Miklos on the option itself and how it behaves. Thanks, Amir.