On Fri, Aug 25, 2023 at 2:30 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > On Fri, Aug 25, 2023 at 10:41 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Thu, Aug 24, 2023 at 5:23 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts > > > > > > > > xattrs will be xattrs that the overlayfs driver never sets. > > > > > > > > It's a precedent, but as long as it is properly documented and encoded > > > > > > > > in fstests, I will be fine with it. Not sure about Miklos. > > > > > > > > > > > > > > Firstly I need to properly understand the proposal. At this point I'm > > > > > > > not sure what overlay.whiteout is supposed to mean. Does it mean the > > > > > > > same as a whiteout (chrdev(0,0))? Or does it mean that overlayfs > > > > > > > should not treat it as a whiteout, but instead transform that into a > > > > > > > chrdev(0,0) for the top overlay to interpret as a whiteout? Or > > > > > > > something else? > > > > > > > > > > > > > > > > > > > My proposal does not involve any transformation. > > > > > > It is "just" to support another format for a whiteout. > > > > > > Transforming a REG or FIFO real object to CHR ovl object > > > > > > will be a pain IMO and I don't see why it is needed. > > > > > > > > > > > > Let me try again from the top: > > > > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0)) > > > > > > or regular file with "overlay.whiteout" xattr, so ovl_lookup() > > > > > > will result in > > > > > > a negative dentry if either whiteout format is found at topmost layer > > > > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked > > > > > > only in case the parent dir has xattr "overlay.xattr_whiteouts" > > > > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts > > > > > > as well as the marking the dirs that contains them with > > > > > > "overlay.xattr_whiteouts" - overlayfs itself never does that > > > > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the > > > > > > the iterated dir has "overlay.xattr_whiteouts" xattr > > > > > > 5. That will cause ovl_cache_update_ino() to lookup the > > > > > > *overlay dentry* that will be negative (as per rule 1 above) > > > > > > if either whiteout format is found at topmost layer and that > > > > > > will cause the readdir dirent to be marked is_whiteout and > > > > > > filtered out of readdir results > > > > > > > > > > > > * The trick for readdir is that the the per dirent DT_CHR optimization > > > > > > is traded off for a per parent dir optimization, but even the worst case > > > > > > where all directories have xattr_whiteouts, readdir is not more > > > > > > expensive than readdir with xino enabled, which is the default for > > > > > > several Linux distros > > > > > > > > > > > > Hope this is more clear? > > > > > > > > > > Ok, so I implemented this, both using the transforming-to-whiteout and > > > > > the alternative-whiteout approach. > > > > > > > > > > Here is the transform-to-whiteout approach: > > > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-transform > > > > > > > > > > In it, if you have a lower dir with these files/xattrs: > > > > > * lowerdir/foo - directory > > > > > trusted.overlay.whiteouts > > > > > * lowerdir/foo/hide_file - regular file > > > > > trusted.overlay.whiteout > > > > > > > > > > Then you will get an overlay no-userxattr mount like this: > > > > > * lowerdir/foo - directory > > > > > * lowerdir/foo/hide_file - chardev(0,0) file > > > > > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > > > > > Here is the alternative-whiteout approach: > > > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative > > > > > > > > > > In it, if you have a lower dir with these files/xattrs: > > > > > * lowerdir/foo - directory > > > > > trusted.overlay.overlay.whiteouts > > > > > user.overlay.whiteouts > > > > > * lowerdir/foo/hide_file - regular file > > > > > trusted.overlay.overlay.whiteout > > > > > user.overlay.whiteout > > > > > > > > > > Then you will get an overlay no-userxattr mount like this: > > > > > * lowerdir/foo - directory > > > > > trusted.overlay.whiteouts > > > > > user.overlay.whiteouts > > > > > * lowerdir/foo/hide_file - regular file > > > > > trusted.overlay.whiteout > > > > > user.overlay.whiteout > > > > > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > > > > > I prefer the transform-to-whiteout approach for a two reasons: > > > > > > > > > > Given an existing image (say an OCI image) we can construct an > > > > > overlayfs mount that is not just functionally identical, but also > > > > > indistinguible from the expected one. I.e. if the original OCI image > > > > > had a chardev(0,0) we will still have one in the mount. > > > > > > > > > > > > > I thought that OCI image format is using a different without format > > > > which is converted to ovl whiteout format anyway: > > > > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts > > > > > > Yeah, but those whiteouts are not the target of this work. An OCI > > > image is a set of tarballs, and one tarball can contain magically > > > marked ".wh." prefixed files for whiteouts. These are converted to > > > real whiteouts by docker. However, suppose the image itself contains > > > an overlayfs lower directory, which the app in the container wants to > > > use. This would look like a chardev(0,0) (not a .wh.*) in the tarball. > > > If this is naively extracted to disk (without escaping) then those > > > whiteouts will not be visible to the container app when it runs in the > > > container, because the outer overlayfs ate them. > > > > > > Such whiteouts in OCI containers don't currently work, so you won't > > > find any such OCI containers in the wild. But with escapes it seems > > > useful. Something similar that you *will* find in the wild however, is > > > ostree images with whiteouts in them, and we want to keep supporting > > > these. See more below. > > > > > > > Also, IIUC, you want this feature for mkfs.composefs, which doesn't > > > > have backward compat requirements and doesn't need to create > > > > ovl images identical to existing ones. Please correct me if I am wrong. > > > > > > One of the goals is to use composefs as a backend for ostree (instead > > > of hardlinked trees). And ostree images with whiteouts in them are > > > pretty common (typically from os images with preinstalled container > > > images). > > > > > > If mkfs.composefs converted these to xattr whiteouts it may work when > > > you run the containers in the image. But its not ideal if the content > > > of the rootfs depends on the ostree backend, and its quite possible > > > that some existing tooling will be confused by the new whiteouts. > > > > > > > I think I understand. I will try to remember how all those pieces > > work together... > > > > > > > When creating multiple lower dirs (e.g. from a multi-layered OCI > > > > > image) you have to carry forward xattrs on directories from the lower > > > > > layers to the upper, otherwise a merged directory from a higher layer > > > > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise > > > > > local operation (just escape the files in this layer) to a global one > > > > > that depends on all layers. > > > > > > > > I don't understand this claim. In you implementation: > > > > rdd->in_xwhiteouts_dir = > > > > ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath); > > > > checks the "overlay.whiteouts" xattr on every layer. > > > > > > > > Checking if an entry is a whiteout only matters in the uppermost layer > > > > that this named entry is found. > > > > > > Yeah, that is true for the whiteouts, but not the escaped whiteouts. > > > It's a bit confusing, so let me give an example: > > > > > > Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout. > > > > > > ├── layer1 > > > │ └── dir > > > │ └── file > > > └── layer2 > > > └── dir - overlay.whiteouts > > > └── file - overlay.whiteout > > > > > > (At this point, it is true what you say. If a layer3/dir existed > > > without "overlay.whiteouts", things would still work.) > > > > > > Now I want to store this structure as the first layer inside another > > > overlayfs by escaping the xattrs, it would look like this: > > > > > > layerA > > > ├── layer1 > > > │ └── dir > > > │ └── file > > > └── layer2 > > > └── dir - overlay.overlay.whiteouts > > > └── file - overlay.overlay.whiteout > > > layerB > > > └── layer2 > > > └── dir > > > └── new-file > > > > > > You'd use it like this: > > > mount -t overlay -o lowerdir=layerB:layerA overlay mntAB > > > mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12 > > > > Painful example. > > Next time please draw the top most layers on top and > > not on bottom. I think it will be clearer this way. > > > > > > > > However, if you try this, you'll notice that It doesn't work, due to a > > > missing "overlay.whiteouts" xattr on mntAB/layer2/dir. > > > This is caused by the file that got added to layer2/dir in layerB. > > > Since we got a new merged directory (layerB/layer2/dir) it overrides > > > the escaped xattrs from layerA/layer2/dir. > > > > > > This is not an absolute showstopper, but it would be a nice property > > > if escaping a layer was an isolated operation independent of the other > > > layers. > > > > > > > I see your point. > > I updated the ovl-nesting-alternative branch with an addition that > makes reading escaped whiteout dir markers take into account all > layers. With that I got the example above to work. > > Those that look sane to you? > I can live with that slightly odd code as it is harmless for anything other than this corner use case, just please add it in a separate commit with some drawing like above to explain the problem that it is solving. Let's see what Miklos has to say. Thanks, Amir.