Re: [PATCH v2 4/5] ovl: Support creation of whiteout files on overlayfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 25, 2023 at 2:30 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
>
> On Fri, Aug 25, 2023 at 10:41 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Thu, Aug 24, 2023 at 5:23 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts
> > > > > > > > xattrs will be xattrs that the overlayfs driver never sets.
> > > > > > > > It's a precedent, but as long as it is properly documented and encoded
> > > > > > > > in fstests, I will be fine with it. Not sure about Miklos.
> > > > > > >
> > > > > > > Firstly I need to properly understand the proposal.  At this point I'm
> > > > > > > not sure what overlay.whiteout is supposed to mean.   Does it mean the
> > > > > > > same as a whiteout (chrdev(0,0))?  Or does it mean that overlayfs
> > > > > > > should not treat it as a whiteout, but instead transform that into a
> > > > > > > chrdev(0,0) for the top overlay to interpret as a whiteout?  Or
> > > > > > > something else?
> > > > > > >
> > > > > >
> > > > > > My proposal does not involve any transformation.
> > > > > > It is "just" to support another format for a whiteout.
> > > > > > Transforming a REG or FIFO real object to CHR ovl object
> > > > > > will be a pain IMO and I don't see why it is needed.
> > > > > >
> > > > > > Let me try again from the top:
> > > > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0))
> > > > > >     or regular file with "overlay.whiteout" xattr, so ovl_lookup()
> > > > > > will result in
> > > > > >     a negative dentry if either whiteout format is found at topmost layer
> > > > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked
> > > > > >     only in case the parent dir has xattr "overlay.xattr_whiteouts"
> > > > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts
> > > > > >     as well as the marking the dirs that contains them with
> > > > > >     "overlay.xattr_whiteouts" - overlayfs itself never does that
> > > > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the
> > > > > >     the iterated dir has "overlay.xattr_whiteouts" xattr
> > > > > > 5. That will cause ovl_cache_update_ino() to lookup the
> > > > > >     *overlay dentry* that will be negative (as per rule 1 above)
> > > > > >     if either whiteout format is found at topmost layer and that
> > > > > >     will cause the readdir dirent to be marked is_whiteout and
> > > > > >     filtered out of readdir results
> > > > > >
> > > > > > * The trick for readdir is that the the per dirent DT_CHR optimization
> > > > > >   is traded off for a per parent dir optimization, but even the worst case
> > > > > >   where all directories have xattr_whiteouts, readdir is not more
> > > > > >   expensive than readdir with xino enabled, which is the default for
> > > > > >   several Linux distros
> > > > > >
> > > > > > Hope this is more clear?
> > > > >
> > > > > Ok, so I implemented this, both using the transforming-to-whiteout and
> > > > > the alternative-whiteout approach.
> > > > >
> > > > > Here is the transform-to-whiteout approach:
> > > > >   https://github.com/alexlarsson/linux/tree/ovl-nesting-transform
> > > > >
> > > > > In it, if you have a lower dir with these files/xattrs:
> > > > >  * lowerdir/foo - directory
> > > > >      trusted.overlay.whiteouts
> > > > >  * lowerdir/foo/hide_file - regular file
> > > > >      trusted.overlay.whiteout
> > > > >
> > > > > Then you will get an overlay no-userxattr mount like this:
> > > > >  * lowerdir/foo - directory
> > > > >  * lowerdir/foo/hide_file - chardev(0,0) file
> > > > >
> > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no.
> > > > >
> > > > > Here is the alternative-whiteout approach:
> > > > >  https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative
> > > > >
> > > > > In it, if you have a lower dir with these files/xattrs:
> > > > >  * lowerdir/foo - directory
> > > > >      trusted.overlay.overlay.whiteouts
> > > > >      user.overlay.whiteouts
> > > > >   * lowerdir/foo/hide_file - regular file
> > > > >      trusted.overlay.overlay.whiteout
> > > > >      user.overlay.whiteout
> > > > >
> > > > > Then you will get an overlay no-userxattr mount like this:
> > > > >  * lowerdir/foo - directory
> > > > >      trusted.overlay.whiteouts
> > > > >      user.overlay.whiteouts
> > > > >   * lowerdir/foo/hide_file - regular file
> > > > >      trusted.overlay.whiteout
> > > > >      user.overlay.whiteout
> > > > >
> > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no.
> > > > >
> > > > > I prefer the transform-to-whiteout approach for a two reasons:
> > > > >
> > > > > Given an existing image (say an OCI image) we can construct an
> > > > > overlayfs mount that is not just functionally identical, but also
> > > > > indistinguible from the expected one. I.e. if the original OCI image
> > > > > had a chardev(0,0) we will still have one in the mount.
> > > > >
> > > >
> > > > I thought that OCI image format is using a different without format
> > > > which is converted to ovl whiteout format anyway:
> > > > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
> > >
> > > Yeah, but those whiteouts are not the target of this work. An OCI
> > > image is a set of tarballs, and one tarball can contain magically
> > > marked ".wh." prefixed files for whiteouts. These are converted to
> > > real whiteouts by docker. However, suppose the image itself contains
> > > an overlayfs lower directory, which the app in the container wants to
> > > use. This would look like a chardev(0,0) (not a .wh.*) in the tarball.
> > > If this is naively extracted to disk (without escaping) then those
> > > whiteouts will not be visible to the container app when it runs in the
> > > container, because the outer overlayfs ate them.
> > >
> > > Such whiteouts in OCI containers don't currently work, so you won't
> > > find any such OCI containers in the wild. But with escapes it seems
> > > useful. Something similar that you *will* find in the wild however, is
> > > ostree images with whiteouts in them, and we want to keep supporting
> > > these. See more below.
> > >
> > > > Also, IIUC, you want this feature for mkfs.composefs, which doesn't
> > > > have backward compat requirements and doesn't need to create
> > > > ovl images identical to existing ones. Please correct me if I am wrong.
> > >
> > > One of the goals is to use composefs as a backend for ostree (instead
> > > of hardlinked trees). And ostree images with whiteouts in them are
> > > pretty common (typically from os images with preinstalled container
> > > images).
> > >
> > > If mkfs.composefs converted these to xattr whiteouts it may work when
> > > you run the containers in the image. But its not ideal if the content
> > > of the rootfs depends on the ostree backend, and its quite possible
> > > that some existing tooling will be confused by the new whiteouts.
> > >
> >
> > I think I understand. I will try to remember how all those pieces
> > work together...
> >
> > > > > When creating multiple lower dirs (e.g. from a multi-layered OCI
> > > > > image) you have to carry forward xattrs on directories from the lower
> > > > > layers to the upper, otherwise a merged directory from a higher layer
> > > > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise
> > > > > local operation (just escape the files in this layer) to a global one
> > > > > that depends on all layers.
> > > >
> > > > I don't understand this claim. In you implementation:
> > > > rdd->in_xwhiteouts_dir =
> > > >    ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath);
> > > > checks the "overlay.whiteouts" xattr on every layer.
> > > >
> > > > Checking if an entry is a whiteout only matters in the uppermost layer
> > > > that this named entry is found.
> > >
> > > Yeah, that is true for the whiteouts, but not the escaped whiteouts.
> > > It's a bit confusing, so let me give an example:
> > >
> > > Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout.
> > >
> > > ├── layer1
> > > │   └── dir
> > > │       └── file
> > > └── layer2
> > >     └── dir - overlay.whiteouts
> > >         └── file - overlay.whiteout
> > >
> > > (At this point, it is true what you say. If a layer3/dir existed
> > > without "overlay.whiteouts", things would still work.)
> > >
> > > Now I want to store this structure as the first layer inside another
> > > overlayfs by escaping the xattrs, it would look like this:
> > >
> > > layerA
> > > ├── layer1
> > > │   └── dir
> > > │       └── file
> > > └── layer2
> > >     └── dir - overlay.overlay.whiteouts
> > >         └── file - overlay.overlay.whiteout
> > > layerB
> > > └── layer2
> > >     └── dir
> > >         └── new-file
> > >
> > > You'd use it like this:
> > >  mount -t overlay -o lowerdir=layerB:layerA overlay mntAB
> > >  mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12
> >
> > Painful example.
> > Next time please draw the top most layers on top and
> > not on bottom. I think it will be clearer this way.
> >
> > >
> > > However, if you try this, you'll notice that It doesn't work, due to a
> > > missing "overlay.whiteouts" xattr on mntAB/layer2/dir.
> > > This is caused by the file that got added to layer2/dir in layerB.
> > > Since we got a new merged directory (layerB/layer2/dir) it overrides
> > > the escaped xattrs from layerA/layer2/dir.
> > >
> > > This is not an absolute showstopper, but it would be a nice property
> > > if escaping a layer was an isolated operation independent of the other
> > > layers.
> > >
> >
> > I see your point.
>
> I updated the ovl-nesting-alternative branch with an addition that
> makes reading escaped whiteout dir markers take into account all
> layers. With that I got the example above to work.
>
> Those that look sane to you?
>

I can live with that slightly odd code as it is harmless for
anything other than this corner use case, just please add it
in a separate commit with some drawing like above to
explain the problem that it is solving.

Let's see what Miklos has to say.

Thanks,
Amir.




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux