On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts > > > > > xattrs will be xattrs that the overlayfs driver never sets. > > > > > It's a precedent, but as long as it is properly documented and encoded > > > > > in fstests, I will be fine with it. Not sure about Miklos. > > > > > > > > Firstly I need to properly understand the proposal. At this point I'm > > > > not sure what overlay.whiteout is supposed to mean. Does it mean the > > > > same as a whiteout (chrdev(0,0))? Or does it mean that overlayfs > > > > should not treat it as a whiteout, but instead transform that into a > > > > chrdev(0,0) for the top overlay to interpret as a whiteout? Or > > > > something else? > > > > > > > > > > My proposal does not involve any transformation. > > > It is "just" to support another format for a whiteout. > > > Transforming a REG or FIFO real object to CHR ovl object > > > will be a pain IMO and I don't see why it is needed. > > > > > > Let me try again from the top: > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0)) > > > or regular file with "overlay.whiteout" xattr, so ovl_lookup() > > > will result in > > > a negative dentry if either whiteout format is found at topmost layer > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked > > > only in case the parent dir has xattr "overlay.xattr_whiteouts" > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts > > > as well as the marking the dirs that contains them with > > > "overlay.xattr_whiteouts" - overlayfs itself never does that > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the > > > the iterated dir has "overlay.xattr_whiteouts" xattr > > > 5. That will cause ovl_cache_update_ino() to lookup the > > > *overlay dentry* that will be negative (as per rule 1 above) > > > if either whiteout format is found at topmost layer and that > > > will cause the readdir dirent to be marked is_whiteout and > > > filtered out of readdir results > > > > > > * The trick for readdir is that the the per dirent DT_CHR optimization > > > is traded off for a per parent dir optimization, but even the worst case > > > where all directories have xattr_whiteouts, readdir is not more > > > expensive than readdir with xino enabled, which is the default for > > > several Linux distros > > > > > > Hope this is more clear? > > > > Ok, so I implemented this, both using the transforming-to-whiteout and > > the alternative-whiteout approach. > > > > Here is the transform-to-whiteout approach: > > https://github.com/alexlarsson/linux/tree/ovl-nesting-transform > > > > In it, if you have a lower dir with these files/xattrs: > > * lowerdir/foo - directory > > trusted.overlay.whiteouts > > * lowerdir/foo/hide_file - regular file > > trusted.overlay.whiteout > > > > Then you will get an overlay no-userxattr mount like this: > > * lowerdir/foo - directory > > * lowerdir/foo/hide_file - chardev(0,0) file > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > Here is the alternative-whiteout approach: > > https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative > > > > In it, if you have a lower dir with these files/xattrs: > > * lowerdir/foo - directory > > trusted.overlay.overlay.whiteouts > > user.overlay.whiteouts > > * lowerdir/foo/hide_file - regular file > > trusted.overlay.overlay.whiteout > > user.overlay.whiteout > > > > Then you will get an overlay no-userxattr mount like this: > > * lowerdir/foo - directory > > trusted.overlay.whiteouts > > user.overlay.whiteouts > > * lowerdir/foo/hide_file - regular file > > trusted.overlay.whiteout > > user.overlay.whiteout > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > I prefer the transform-to-whiteout approach for a two reasons: > > > > Given an existing image (say an OCI image) we can construct an > > overlayfs mount that is not just functionally identical, but also > > indistinguible from the expected one. I.e. if the original OCI image > > had a chardev(0,0) we will still have one in the mount. > > > > I thought that OCI image format is using a different without format > which is converted to ovl whiteout format anyway: > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts Yeah, but those whiteouts are not the target of this work. An OCI image is a set of tarballs, and one tarball can contain magically marked ".wh." prefixed files for whiteouts. These are converted to real whiteouts by docker. However, suppose the image itself contains an overlayfs lower directory, which the app in the container wants to use. This would look like a chardev(0,0) (not a .wh.*) in the tarball. If this is naively extracted to disk (without escaping) then those whiteouts will not be visible to the container app when it runs in the container, because the outer overlayfs ate them. Such whiteouts in OCI containers don't currently work, so you won't find any such OCI containers in the wild. But with escapes it seems useful. Something similar that you *will* find in the wild however, is ostree images with whiteouts in them, and we want to keep supporting these. See more below. > Also, IIUC, you want this feature for mkfs.composefs, which doesn't > have backward compat requirements and doesn't need to create > ovl images identical to existing ones. Please correct me if I am wrong. One of the goals is to use composefs as a backend for ostree (instead of hardlinked trees). And ostree images with whiteouts in them are pretty common (typically from os images with preinstalled container images). If mkfs.composefs converted these to xattr whiteouts it may work when you run the containers in the image. But its not ideal if the content of the rootfs depends on the ostree backend, and its quite possible that some existing tooling will be confused by the new whiteouts. > > When creating multiple lower dirs (e.g. from a multi-layered OCI > > image) you have to carry forward xattrs on directories from the lower > > layers to the upper, otherwise a merged directory from a higher layer > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise > > local operation (just escape the files in this layer) to a global one > > that depends on all layers. > > I don't understand this claim. In you implementation: > rdd->in_xwhiteouts_dir = > ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath); > checks the "overlay.whiteouts" xattr on every layer. > > Checking if an entry is a whiteout only matters in the uppermost layer > that this named entry is found. Yeah, that is true for the whiteouts, but not the escaped whiteouts. It's a bit confusing, so let me give an example: Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout. ├── layer1 │ └── dir │ └── file └── layer2 └── dir - overlay.whiteouts └── file - overlay.whiteout (At this point, it is true what you say. If a layer3/dir existed without "overlay.whiteouts", things would still work.) Now I want to store this structure as the first layer inside another overlayfs by escaping the xattrs, it would look like this: layerA ├── layer1 │ └── dir │ └── file └── layer2 └── dir - overlay.overlay.whiteouts └── file - overlay.overlay.whiteout layerB └── layer2 └── dir └── new-file You'd use it like this: mount -t overlay -o lowerdir=layerB:layerA overlay mntAB mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12 However, if you try this, you'll notice that It doesn't work, due to a missing "overlay.whiteouts" xattr on mntAB/layer2/dir. This is caused by the file that got added to layer2/dir in layerB. Since we got a new merged directory (layerB/layer2/dir) it overrides the escaped xattrs from layerA/layer2/dir. This is not an absolute showstopper, but it would be a nice property if escaping a layer was an isolated operation independent of the other layers. > My understanding is that is mkfs.composefs creates a xwhiteout file, > it ONLY needs to locally set the overlay.whiteouts xattr on its parent > dir. I don't understand how multi layers matter. > > > > > In terms of implementation complexity I think they are very similar. > > > > Sorry. I disagree. The two implementations may be on par wrt > lines of code, but I perceive the transform-to-whiteout patch as > harder to maintain. > > The fact that an xattr changes the file type seems terrifying to me. > Imagine that upper layer has a non-empty regular file with > overlay.whiteout xattr - this is exposed as a regular file and then > user truncates this regular file - BOOM (turn to whiteout?). Nod. > I need to understand your concerns about my proposal because > I did not get them. Hopefully I explained it above, so we can come up with a solution. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx