Re: [PATCH v2 4/5] ovl: Support creation of whiteout files on overlayfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 24, 2023 at 5:23 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
>
> On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > >
> > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts
> > > > > > xattrs will be xattrs that the overlayfs driver never sets.
> > > > > > It's a precedent, but as long as it is properly documented and encoded
> > > > > > in fstests, I will be fine with it. Not sure about Miklos.
> > > > >
> > > > > Firstly I need to properly understand the proposal.  At this point I'm
> > > > > not sure what overlay.whiteout is supposed to mean.   Does it mean the
> > > > > same as a whiteout (chrdev(0,0))?  Or does it mean that overlayfs
> > > > > should not treat it as a whiteout, but instead transform that into a
> > > > > chrdev(0,0) for the top overlay to interpret as a whiteout?  Or
> > > > > something else?
> > > > >
> > > >
> > > > My proposal does not involve any transformation.
> > > > It is "just" to support another format for a whiteout.
> > > > Transforming a REG or FIFO real object to CHR ovl object
> > > > will be a pain IMO and I don't see why it is needed.
> > > >
> > > > Let me try again from the top:
> > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0))
> > > >     or regular file with "overlay.whiteout" xattr, so ovl_lookup()
> > > > will result in
> > > >     a negative dentry if either whiteout format is found at topmost layer
> > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked
> > > >     only in case the parent dir has xattr "overlay.xattr_whiteouts"
> > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts
> > > >     as well as the marking the dirs that contains them with
> > > >     "overlay.xattr_whiteouts" - overlayfs itself never does that
> > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the
> > > >     the iterated dir has "overlay.xattr_whiteouts" xattr
> > > > 5. That will cause ovl_cache_update_ino() to lookup the
> > > >     *overlay dentry* that will be negative (as per rule 1 above)
> > > >     if either whiteout format is found at topmost layer and that
> > > >     will cause the readdir dirent to be marked is_whiteout and
> > > >     filtered out of readdir results
> > > >
> > > > * The trick for readdir is that the the per dirent DT_CHR optimization
> > > >   is traded off for a per parent dir optimization, but even the worst case
> > > >   where all directories have xattr_whiteouts, readdir is not more
> > > >   expensive than readdir with xino enabled, which is the default for
> > > >   several Linux distros
> > > >
> > > > Hope this is more clear?
> > >
> > > Ok, so I implemented this, both using the transforming-to-whiteout and
> > > the alternative-whiteout approach.
> > >
> > > Here is the transform-to-whiteout approach:
> > >   https://github.com/alexlarsson/linux/tree/ovl-nesting-transform
> > >
> > > In it, if you have a lower dir with these files/xattrs:
> > >  * lowerdir/foo - directory
> > >      trusted.overlay.whiteouts
> > >  * lowerdir/foo/hide_file - regular file
> > >      trusted.overlay.whiteout
> > >
> > > Then you will get an overlay no-userxattr mount like this:
> > >  * lowerdir/foo - directory
> > >  * lowerdir/foo/hide_file - chardev(0,0) file
> > >
> > > This can be used as a lower in any overlayfs mount you want, userxattr or no.
> > >
> > > Here is the alternative-whiteout approach:
> > >  https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative
> > >
> > > In it, if you have a lower dir with these files/xattrs:
> > >  * lowerdir/foo - directory
> > >      trusted.overlay.overlay.whiteouts
> > >      user.overlay.whiteouts
> > >   * lowerdir/foo/hide_file - regular file
> > >      trusted.overlay.overlay.whiteout
> > >      user.overlay.whiteout
> > >
> > > Then you will get an overlay no-userxattr mount like this:
> > >  * lowerdir/foo - directory
> > >      trusted.overlay.whiteouts
> > >      user.overlay.whiteouts
> > >   * lowerdir/foo/hide_file - regular file
> > >      trusted.overlay.whiteout
> > >      user.overlay.whiteout
> > >
> > > This can be used as a lower in any overlayfs mount you want, userxattr or no.
> > >
> > > I prefer the transform-to-whiteout approach for a two reasons:
> > >
> > > Given an existing image (say an OCI image) we can construct an
> > > overlayfs mount that is not just functionally identical, but also
> > > indistinguible from the expected one. I.e. if the original OCI image
> > > had a chardev(0,0) we will still have one in the mount.
> > >
> >
> > I thought that OCI image format is using a different without format
> > which is converted to ovl whiteout format anyway:
> > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts
>
> Yeah, but those whiteouts are not the target of this work. An OCI
> image is a set of tarballs, and one tarball can contain magically
> marked ".wh." prefixed files for whiteouts. These are converted to
> real whiteouts by docker. However, suppose the image itself contains
> an overlayfs lower directory, which the app in the container wants to
> use. This would look like a chardev(0,0) (not a .wh.*) in the tarball.
> If this is naively extracted to disk (without escaping) then those
> whiteouts will not be visible to the container app when it runs in the
> container, because the outer overlayfs ate them.
>
> Such whiteouts in OCI containers don't currently work, so you won't
> find any such OCI containers in the wild. But with escapes it seems
> useful. Something similar that you *will* find in the wild however, is
> ostree images with whiteouts in them, and we want to keep supporting
> these. See more below.
>
> > Also, IIUC, you want this feature for mkfs.composefs, which doesn't
> > have backward compat requirements and doesn't need to create
> > ovl images identical to existing ones. Please correct me if I am wrong.
>
> One of the goals is to use composefs as a backend for ostree (instead
> of hardlinked trees). And ostree images with whiteouts in them are
> pretty common (typically from os images with preinstalled container
> images).
>
> If mkfs.composefs converted these to xattr whiteouts it may work when
> you run the containers in the image. But its not ideal if the content
> of the rootfs depends on the ostree backend, and its quite possible
> that some existing tooling will be confused by the new whiteouts.
>

I think I understand. I will try to remember how all those pieces
work together...

> > > When creating multiple lower dirs (e.g. from a multi-layered OCI
> > > image) you have to carry forward xattrs on directories from the lower
> > > layers to the upper, otherwise a merged directory from a higher layer
> > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise
> > > local operation (just escape the files in this layer) to a global one
> > > that depends on all layers.
> >
> > I don't understand this claim. In you implementation:
> > rdd->in_xwhiteouts_dir =
> >    ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath);
> > checks the "overlay.whiteouts" xattr on every layer.
> >
> > Checking if an entry is a whiteout only matters in the uppermost layer
> > that this named entry is found.
>
> Yeah, that is true for the whiteouts, but not the escaped whiteouts.
> It's a bit confusing, so let me give an example:
>
> Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout.
>
> ├── layer1
> │   └── dir
> │       └── file
> └── layer2
>     └── dir - overlay.whiteouts
>         └── file - overlay.whiteout
>
> (At this point, it is true what you say. If a layer3/dir existed
> without "overlay.whiteouts", things would still work.)
>
> Now I want to store this structure as the first layer inside another
> overlayfs by escaping the xattrs, it would look like this:
>
> layerA
> ├── layer1
> │   └── dir
> │       └── file
> └── layer2
>     └── dir - overlay.overlay.whiteouts
>         └── file - overlay.overlay.whiteout
> layerB
> └── layer2
>     └── dir
>         └── new-file
>
> You'd use it like this:
>  mount -t overlay -o lowerdir=layerB:layerA overlay mntAB
>  mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12

Painful example.
Next time please draw the top most layers on top and
not on bottom. I think it will be clearer this way.

>
> However, if you try this, you'll notice that It doesn't work, due to a
> missing "overlay.whiteouts" xattr on mntAB/layer2/dir.
> This is caused by the file that got added to layer2/dir in layerB.
> Since we got a new merged directory (layerB/layer2/dir) it overrides
> the escaped xattrs from layerA/layer2/dir.
>
> This is not an absolute showstopper, but it would be a nice property
> if escaping a layer was an isolated operation independent of the other
> layers.
>

I see your point.

> > My understanding is that is mkfs.composefs creates a xwhiteout file,
> > it ONLY needs to locally set the overlay.whiteouts xattr on its parent
> > dir. I don't understand how multi layers matter.
> >
> > >
> > > In terms of implementation complexity I think they are very similar.
> > >
> >
> > Sorry. I disagree. The two implementations may be on par wrt
> > lines of code, but I perceive the transform-to-whiteout patch as
> > harder to maintain.
> >
> > The fact that an xattr changes the file type seems terrifying to me.
> > Imagine that upper layer has a non-empty regular file with
> > overlay.whiteout xattr - this is exposed as a regular file and then
> > user truncates this regular file - BOOM (turn to whiteout?).
>
> Nod.
>
> > I need to understand your concerns about my proposal because
> > I did not get them.
>
> Hopefully I explained it above, so we can come up with a solution.
>

I think that the first thing to do would be to minimize the requirements.
I have lost track of the use cases and possible combinations of ostree
containers and unprivileged users.

Perhaps narrowing the requirement can simplify the design?

One option that we should consider - it may simplify the design a bit -
use an opt-in mount option to support interpreting escaping xattrs.
It may be wise to opt-in to this feature anyway - less chance for
regressions we did not think about.

This means you can use xwhiteouts with no need of marking the
containing directories. ovl_calc_d_ino() would just blindly check for
ofs->config.<something>.

Would the opt-in mount option be a problem to your use cases?

Thanks,
Amir.




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux