On Thu, Aug 24, 2023 at 5:23 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > On Thu, Aug 24, 2023 at 1:43 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Thu, Aug 24, 2023 at 12:56 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > On Wed, Aug 23, 2023 at 5:50 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > On Wed, Aug 23, 2023 at 5:52 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > On Wed, 23 Aug 2023 at 16:43, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > > > If we do this, then both overlay.whiteout and overlay.xattr_whiteouts > > > > > > xattrs will be xattrs that the overlayfs driver never sets. > > > > > > It's a precedent, but as long as it is properly documented and encoded > > > > > > in fstests, I will be fine with it. Not sure about Miklos. > > > > > > > > > > Firstly I need to properly understand the proposal. At this point I'm > > > > > not sure what overlay.whiteout is supposed to mean. Does it mean the > > > > > same as a whiteout (chrdev(0,0))? Or does it mean that overlayfs > > > > > should not treat it as a whiteout, but instead transform that into a > > > > > chrdev(0,0) for the top overlay to interpret as a whiteout? Or > > > > > something else? > > > > > > > > > > > > > My proposal does not involve any transformation. > > > > It is "just" to support another format for a whiteout. > > > > Transforming a REG or FIFO real object to CHR ovl object > > > > will be a pain IMO and I don't see why it is needed. > > > > > > > > Let me try again from the top: > > > > 1. ovl_path_is_whiteout() checks if either ovl_is_whiteout() (chardev(0,0)) > > > > or regular file with "overlay.whiteout" xattr, so ovl_lookup() > > > > will result in > > > > a negative dentry if either whiteout format is found at topmost layer > > > > 2. To optimize unneeded getxattr, "overlay.whiteout" xattr could be checked > > > > only in case the parent dir has xattr "overlay.xattr_whiteouts" > > > > 3. mkfs.composefs is responsible of creating the non-chardev whiteouts > > > > as well as the marking the dirs that contains them with > > > > "overlay.xattr_whiteouts" - overlayfs itself never does that > > > > 4. ovl_calc_d_ino() (from readdir on a merge dir) returns true if the > > > > the iterated dir has "overlay.xattr_whiteouts" xattr > > > > 5. That will cause ovl_cache_update_ino() to lookup the > > > > *overlay dentry* that will be negative (as per rule 1 above) > > > > if either whiteout format is found at topmost layer and that > > > > will cause the readdir dirent to be marked is_whiteout and > > > > filtered out of readdir results > > > > > > > > * The trick for readdir is that the the per dirent DT_CHR optimization > > > > is traded off for a per parent dir optimization, but even the worst case > > > > where all directories have xattr_whiteouts, readdir is not more > > > > expensive than readdir with xino enabled, which is the default for > > > > several Linux distros > > > > > > > > Hope this is more clear? > > > > > > Ok, so I implemented this, both using the transforming-to-whiteout and > > > the alternative-whiteout approach. > > > > > > Here is the transform-to-whiteout approach: > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-transform > > > > > > In it, if you have a lower dir with these files/xattrs: > > > * lowerdir/foo - directory > > > trusted.overlay.whiteouts > > > * lowerdir/foo/hide_file - regular file > > > trusted.overlay.whiteout > > > > > > Then you will get an overlay no-userxattr mount like this: > > > * lowerdir/foo - directory > > > * lowerdir/foo/hide_file - chardev(0,0) file > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > Here is the alternative-whiteout approach: > > > https://github.com/alexlarsson/linux/tree/ovl-nesting-alternative > > > > > > In it, if you have a lower dir with these files/xattrs: > > > * lowerdir/foo - directory > > > trusted.overlay.overlay.whiteouts > > > user.overlay.whiteouts > > > * lowerdir/foo/hide_file - regular file > > > trusted.overlay.overlay.whiteout > > > user.overlay.whiteout > > > > > > Then you will get an overlay no-userxattr mount like this: > > > * lowerdir/foo - directory > > > trusted.overlay.whiteouts > > > user.overlay.whiteouts > > > * lowerdir/foo/hide_file - regular file > > > trusted.overlay.whiteout > > > user.overlay.whiteout > > > > > > This can be used as a lower in any overlayfs mount you want, userxattr or no. > > > > > > I prefer the transform-to-whiteout approach for a two reasons: > > > > > > Given an existing image (say an OCI image) we can construct an > > > overlayfs mount that is not just functionally identical, but also > > > indistinguible from the expected one. I.e. if the original OCI image > > > had a chardev(0,0) we will still have one in the mount. > > > > > > > I thought that OCI image format is using a different without format > > which is converted to ovl whiteout format anyway: > > https://github.com/opencontainers/image-spec/blob/main/layer.md#whiteouts > > Yeah, but those whiteouts are not the target of this work. An OCI > image is a set of tarballs, and one tarball can contain magically > marked ".wh." prefixed files for whiteouts. These are converted to > real whiteouts by docker. However, suppose the image itself contains > an overlayfs lower directory, which the app in the container wants to > use. This would look like a chardev(0,0) (not a .wh.*) in the tarball. > If this is naively extracted to disk (without escaping) then those > whiteouts will not be visible to the container app when it runs in the > container, because the outer overlayfs ate them. > > Such whiteouts in OCI containers don't currently work, so you won't > find any such OCI containers in the wild. But with escapes it seems > useful. Something similar that you *will* find in the wild however, is > ostree images with whiteouts in them, and we want to keep supporting > these. See more below. > > > Also, IIUC, you want this feature for mkfs.composefs, which doesn't > > have backward compat requirements and doesn't need to create > > ovl images identical to existing ones. Please correct me if I am wrong. > > One of the goals is to use composefs as a backend for ostree (instead > of hardlinked trees). And ostree images with whiteouts in them are > pretty common (typically from os images with preinstalled container > images). > > If mkfs.composefs converted these to xattr whiteouts it may work when > you run the containers in the image. But its not ideal if the content > of the rootfs depends on the ostree backend, and its quite possible > that some existing tooling will be confused by the new whiteouts. > I think I understand. I will try to remember how all those pieces work together... > > > When creating multiple lower dirs (e.g. from a multi-layered OCI > > > image) you have to carry forward xattrs on directories from the lower > > > layers to the upper, otherwise a merged directory from a higher layer > > > will overwrite the "overlay.whiteouts" xattr. This makes an otherwise > > > local operation (just escape the files in this layer) to a global one > > > that depends on all layers. > > > > I don't understand this claim. In you implementation: > > rdd->in_xwhiteouts_dir = > > ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath); > > checks the "overlay.whiteouts" xattr on every layer. > > > > Checking if an entry is a whiteout only matters in the uppermost layer > > that this named entry is found. > > Yeah, that is true for the whiteouts, but not the escaped whiteouts. > It's a bit confusing, so let me give an example: > > Suppose I have this file structure with a 2 layer overlayfs with an xwhiteout. > > ├── layer1 > │ └── dir > │ └── file > └── layer2 > └── dir - overlay.whiteouts > └── file - overlay.whiteout > > (At this point, it is true what you say. If a layer3/dir existed > without "overlay.whiteouts", things would still work.) > > Now I want to store this structure as the first layer inside another > overlayfs by escaping the xattrs, it would look like this: > > layerA > ├── layer1 > │ └── dir > │ └── file > └── layer2 > └── dir - overlay.overlay.whiteouts > └── file - overlay.overlay.whiteout > layerB > └── layer2 > └── dir > └── new-file > > You'd use it like this: > mount -t overlay -o lowerdir=layerB:layerA overlay mntAB > mount -t overlay -o lowerdir=mntAB/layer2:mntAB/layer1 overlay mnt12 Painful example. Next time please draw the top most layers on top and not on bottom. I think it will be clearer this way. > > However, if you try this, you'll notice that It doesn't work, due to a > missing "overlay.whiteouts" xattr on mntAB/layer2/dir. > This is caused by the file that got added to layer2/dir in layerB. > Since we got a new merged directory (layerB/layer2/dir) it overrides > the escaped xattrs from layerA/layer2/dir. > > This is not an absolute showstopper, but it would be a nice property > if escaping a layer was an isolated operation independent of the other > layers. > I see your point. > > My understanding is that is mkfs.composefs creates a xwhiteout file, > > it ONLY needs to locally set the overlay.whiteouts xattr on its parent > > dir. I don't understand how multi layers matter. > > > > > > > > In terms of implementation complexity I think they are very similar. > > > > > > > Sorry. I disagree. The two implementations may be on par wrt > > lines of code, but I perceive the transform-to-whiteout patch as > > harder to maintain. > > > > The fact that an xattr changes the file type seems terrifying to me. > > Imagine that upper layer has a non-empty regular file with > > overlay.whiteout xattr - this is exposed as a regular file and then > > user truncates this regular file - BOOM (turn to whiteout?). > > Nod. > > > I need to understand your concerns about my proposal because > > I did not get them. > > Hopefully I explained it above, so we can come up with a solution. > I think that the first thing to do would be to minimize the requirements. I have lost track of the use cases and possible combinations of ostree containers and unprivileged users. Perhaps narrowing the requirement can simplify the design? One option that we should consider - it may simplify the design a bit - use an opt-in mount option to support interpreting escaping xattrs. It may be wise to opt-in to this feature anyway - less chance for regressions we did not think about. This means you can use xwhiteouts with no need of marking the containing directories. ovl_calc_d_ino() would just blindly check for ofs->config.<something>. Would the opt-in mount option be a problem to your use cases? Thanks, Amir.