On Fri, Aug 25, 2023 at 10:41 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Thu, Aug 24, 2023 at 5:23 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts > > > > > > > xattrs will be xattrs that the overlayfs driver never sets. > > > > > > > It's a precedent, but as long as it is properly documented and encoded > > > > > > > in fstests, I will be fine with it. Not sure about Miklos. > > > > > > > > > > > > Firstly I need to properly understand the proposal. At this point I'm > > > > > > not sure what overlay.whiteout is supposed to mean. Does it mean the > > > > > > same as a whiteout (chrdev(0,0))? Or does it mean that overlayfs > > > > > > should not treat it as a whiteout, but instead transform that into a > > > > > > chrdev(0,0) for the top overlay to interpret as a whiteout? Or > > > > > > something else? > > > > > > > > > > > > > > > > My proposal does not involve any transformation. > > > > > It is "just" to support another format for a whiteout. > > > > > Transforming a REG or FIFO real object to CHR ovl object > > > > > will be a pain IMO and I don't see why it is needed. > > > > > > > > > > Let me try again from the top: > > > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0)) > > > > > or regular file with "overlay.whiteout" xattr, so ovl_lookup() > > > > > will result in > > > > > a negative dentry if either whiteout format is found at topmost layer > > > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked > > > > > only in case the parent dir has xattr "overlay.xattr_whiteouts" > > > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts > > > > > as well as the marking the dirs that contains them with > > > > > "overlay.xattr_whiteouts" - overlayfs itself never does that > > > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the > > > > > the iterated dir has "overlay.xattr_whiteouts" xattr > > > > > 5. That will cause ovl_cache_update_ino() to lookup the > > > > > *overlay dentry* that will be negative (as per rule 1 above) > > > > > if either whiteout format is found at topmost layer and that > > > > > will cause the readdir dirent to be marked is_whiteout and > > > > > filtered out of readdir results > > > > > > > > > > * The trick for readdir is that the the per dirent DT_CHR optimization > > > > > is traded off for a per parent dir optimization, but even the worst case > > > > > where all directories have xattr_whiteouts, readdir is not more > > > > > expensive than readdir with xino enabled, which is the default for > > > > > several Linux distros > > > > > > > > > > Hope this is more clear? > > > > > > > > Ok, so I implemented this, both using the transforming-to-whiteout and > > > > the alternative-whiteout approach. > > > > > > > > Here is the transform-to-whiteout approach: > > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-transform > > > > > > > > In it, if you have a lower dir with these files/xattrs: > > > > * lowerdir/foo - directory > > > > trusted.overlay.whiteouts > > > > * lowerdir/foo/hide_file - regular file > > > > trusted.overlay.whiteout > > > > > > > > Then you will get an overlay no-userxattr mount like this: > > > > * lowerdir/foo - directory > > > > * lowerdir/foo/hide_file - chardev(0,0) file > > > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > > > Here is the alternative-whiteout approach: > > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative > > > > > > > > In it, if you have a lower dir with these files/xattrs: > > > > * lowerdir/foo - directory > > > > trusted.overlay.overlay.whiteouts > > > > user.overlay.whiteouts > > > > * lowerdir/foo/hide_file - regular file > > > > trusted.overlay.overlay.whiteout > > > > user.overlay.whiteout > > > > > > > > Then you will get an overlay no-userxattr mount like this: > > > > * lowerdir/foo - directory > > > > trusted.overlay.whiteouts > > > > user.overlay.whiteouts > > > > * lowerdir/foo/hide_file - regular file > > > > trusted.overlay.whiteout > > > > user.overlay.whiteout > > > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > > > I prefer the transform-to-whiteout approach for a two reasons: > > > > > > > > Given an existing image (say an OCI image) we can construct an > > > > overlayfs mount that is not just functionally identical, but also > > > > indistinguible from the expected one. I.e. if the original OCI image > > > > had a chardev(0,0) we will still have one in the mount. > > > > > > > > > > I thought that OCI image format is using a different without format > > > which is converted to ovl whiteout format anyway: > > > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts > > > > Yeah, but those whiteouts are not the target of this work. An OCI > > image is a set of tarballs, and one tarball can contain magically > > marked ".wh." prefixed files for whiteouts. These are converted to > > real whiteouts by docker. However, suppose the image itself contains > > an overlayfs lower directory, which the app in the container wants to > > use. This would look like a chardev(0,0) (not a .wh.*) in the tarball. > > If this is naively extracted to disk (without escaping) then those > > whiteouts will not be visible to the container app when it runs in the > > container, because the outer overlayfs ate them. > > > > Such whiteouts in OCI containers don't currently work, so you won't > > find any such OCI containers in the wild. But with escapes it seems > > useful. Something similar that you *will* find in the wild however, is > > ostree images with whiteouts in them, and we want to keep supporting > > these. See more below. > > > > > Also, IIUC, you want this feature for mkfs.composefs, which doesn't > > > have backward compat requirements and doesn't need to create > > > ovl images identical to existing ones. Please correct me if I am wrong. > > > > One of the goals is to use composefs as a backend for ostree (instead > > of hardlinked trees). And ostree images with whiteouts in them are > > pretty common (typically from os images with preinstalled container > > images). > > > > If mkfs.composefs converted these to xattr whiteouts it may work when > > you run the containers in the image. But its not ideal if the content > > of the rootfs depends on the ostree backend, and its quite possible > > that some existing tooling will be confused by the new whiteouts. > > > > I think I understand. I will try to remember how all those pieces > work together... > > > > > When creating multiple lower dirs (e.g. from a multi-layered OCI > > > > image) you have to carry forward xattrs on directories from the lower > > > > layers to the upper, otherwise a merged directory from a higher layer > > > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise > > > > local operation (just escape the files in this layer) to a global one > > > > that depends on all layers. > > > > > > I don't understand this claim. In you implementation: > > > rdd->in_xwhiteouts_dir = > > > ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath); > > > checks the "overlay.whiteouts" xattr on every layer. > > > > > > Checking if an entry is a whiteout only matters in the uppermost layer > > > that this named entry is found. > > > > Yeah, that is true for the whiteouts, but not the escaped whiteouts. > > It's a bit confusing, so let me give an example: > > > > Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout. > > > > ├── layer1 > > │ └── dir > > │ └── file > > └── layer2 > > └── dir - overlay.whiteouts > > └── file - overlay.whiteout > > > > (At this point, it is true what you say. If a layer3/dir existed > > without "overlay.whiteouts", things would still work.) > > > > Now I want to store this structure as the first layer inside another > > overlayfs by escaping the xattrs, it would look like this: > > > > layerA > > ├── layer1 > > │ └── dir > > │ └── file > > └── layer2 > > └── dir - overlay.overlay.whiteouts > > └── file - overlay.overlay.whiteout > > layerB > > └── layer2 > > └── dir > > └── new-file > > > > You'd use it like this: > > mount -t overlay -o lowerdir=layerB:layerA overlay mntAB > > mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12 > > Painful example. > Next time please draw the top most layers on top and > not on bottom. I think it will be clearer this way. > > > > > However, if you try this, you'll notice that It doesn't work, due to a > > missing "overlay.whiteouts" xattr on mntAB/layer2/dir. > > This is caused by the file that got added to layer2/dir in layerB. > > Since we got a new merged directory (layerB/layer2/dir) it overrides > > the escaped xattrs from layerA/layer2/dir. > > > > This is not an absolute showstopper, but it would be a nice property > > if escaping a layer was an isolated operation independent of the other > > layers. > > > > I see your point. I updated the ovl-nesting-alternative branch with an addition that makes reading escaped whiteout dir markers take into account all layers. With that I got the example above to work. Those that look sane to you? I also updated the composefs escape PR: https://github.com/containers/composefs/pull/175 With this I am able to take a directory containing a set of normal overlayfs layers (including metacopy files and regular whiteouts), run mkcomposefs on it, mount the composefs, and then mount an overlayfs from the layer dirs in the composefs mount. > > > My understanding is that is mkfs.composefs creates a xwhiteout file, > > > it ONLY needs to locally set the overlay.whiteouts xattr on its parent > > > dir. I don't understand how multi layers matter. > > > > > > > > > > > In terms of implementation complexity I think they are very similar. > > > > > > > > > > Sorry. I disagree. The two implementations may be on par wrt > > > lines of code, but I perceive the transform-to-whiteout patch as > > > harder to maintain. > > > > > > The fact that an xattr changes the file type seems terrifying to me. > > > Imagine that upper layer has a non-empty regular file with > > > overlay.whiteout xattr - this is exposed as a regular file and then > > > user truncates this regular file - BOOM (turn to whiteout?). > > > > Nod. > > > > > I need to understand your concerns about my proposal because > > > I did not get them. > > > > Hopefully I explained it above, so we can come up with a solution. > > > > I think that the first thing to do would be to minimize the requirements. > I have lost track of the use cases and possible combinations of ostree > containers and unprivileged users. > > Perhaps narrowing the requirement can simplify the design? > > One option that we should consider - it may simplify the design a bit - > use an opt-in mount option to support interpreting escaping xattrs. > It may be wise to opt-in to this feature anyway - less chance for > regressions we did not think about. That is not really helpful to me, I need to support a composefs mount that contains files with overlay xattrs (to support e.g. container layers with metacopy) as well as whiteout files (to support container layers with whiteouts). > This means you can use xwhiteouts with no need of marking the > containing directories. ovl_calc_d_ino() would just blindly check for > ofs->config.<something>. > > Would the opt-in mount option be a problem to your use cases? The only option I see that could potentially useful for the composefs usecase is to *disable* regular whiteout processing, and just expose chardev(0,0) files as they are. Then we can use xwhiteouts for the composefs layers and expose the chardev(0,0) files in the image as is. Not sure this is worth it though? At the end of the day, being able to nest overlayfs with whiteouts in them Is a clear improvement on what we had. Then we just have to make userspace handle the fact that the whiteout files change types when in a composefs image. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx