On Wed, Aug 23, 2023 at 3:13 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > On Tue, Aug 22, 2023 at 5:31 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Tue, Aug 22, 2023 at 5:36 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > On Tue, Aug 22, 2023 at 4:25 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > On Tue, Aug 22, 2023 at 3:56 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, 22 Aug 2023 at 15:22, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Mon, Aug 21, 2023 at 1:00 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu, 17 Aug 2023 at 13:05, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > This is needed to properly stack overlay filesystems, I.E, being able > > > > > > > > to create a whiteout file on an overlay mount and then use that as > > > > > > > > part of the lowerdir in another overlay mount. > > > > > > > > > > > > > > > > The way this works is that we create a regular whiteout, but set the > > > > > > > > `overlay.nowhiteout` xattr on it. Whenever we check if a file is a > > > > > > > > whiteout we check this xattr and don't treat it as a whiteout if it is > > > > > > > > set. The xattr itself is then stripped and when viewed as part of the > > > > > > > > overlayfs mount it looks like a regular whiteout. > > > > > > > > > > > > > > > > > > > > > > I understand the motivation, but don't have good feelings about the > > > > > > > implementation. Like the xattr escaping this should also have the > > > > > > > property that when fed to an old kernel version, it shouldn't > > > > > > > interpret this object as a whiteout. Whether it remains hidden like > > > > > > > the escaped xattrs or if it shows up as something else is > > > > > > > uninteresting. > > > > > > > > > > > > > > It could just be a zero sized regular file with "overlay.whiteout". > > > > > > > > > > > > So, I started doing this, where a whiteout is just a regular file with > > > > > > the xattr set. Initially I thought I only needed to check the xattr > > > > > > during lookup and convert the inode mode from S_IFREG to S_IFCHR. > > > > > > However, I also need to hook up readdir and convert DT_REG to DT_CHR, > > > > > > otherwise readdir will report the wrong d_type. To make it worse, > > > > > > overlayfs itself looks for DT_CHR to handle whiteouts when listing > > > > > > files, so nesting is not working without that. > > > > > > > > > > > > The only way I see to implement that conversion is to call getxattr() > > > > > > on every DT_REG file during readdir(), and while a single getxattr() > > > > > > on lookup is fine, I don't think that is. > > > > > > > > > > > > Any other ideas? > > > > > > > > > > Not messing with d_type seems a good idea. How about a random > > > > > unreserved chardev? > > > > > > > > Only the whiteout one (0,0) can be created by non-root users. > > > > > > I was thinking of (ab)using DT_SOCK or DT_FIFO, but turns out you > > > can't store xattrs on such files. > > > > FWIW, there is also DT_WHT that was defined and never used. > > But that is just an anecdote. > > > > Regarding the issue of avoiding getxattr for every dirent. > > Note that in readdir, dirent that goes through ovl_cache_update_ino() > > calls lookup()/stat() on the overlay itself, so as long as ovl_lookup() > > will treat overlay.whiteout file as a whiteout, the code > > /* Mark a stale entry */ > > p->is_whiteout = true; > > will kick in and do the right thing for readdir wrt cleaning up > > lower entries covered with whiteouts, regardless of DT_CHR. > > > > Now all that is left is to make sure that ovl_cache_update_ino() > > is called if there are possibly overlay.whiteout files. > > > > For the case of nested ovl, upper and lower fs cannot be the same, > > so ovl_same_fs() is false. > > Therefore, as long as xino is enabled (this is the default), > > ovl_same_dev() is true => ovl_xino_bits() > 0 => > > ovl_calc_d_ino() is true and ovl_cache_update_ino() will be > > called for all merged dirents. > > > > IOW, unless I am missing something, if you implement overlay.whiteout > > logic in ovl_lookup() correctly, readdir should "just work" as long as > > the mounter does not explicitly use -o xino=off. > > Things are not that rosy for this. > > First of all, the default value for OVERLAY_FS_XINO_AUTO is no, so by > default xino is not enabled. This means that overlay.whiteout only > works if you enable xino=on/auto in the mount. > > Secondly, in the case where all upper and lower are on the same fs, > even if xino is enabled it is ignored. This is not the case where the > lower is on a nested overlayfs as you say, but maybe we want to > create a lower dir that works on both when stored on a overlayfs and > elsewere. Relying on xino which is essentially unrelated to whiteouts > to get enabled seems quite fragile. > > I'm gonna instead try to make the alternative whiteout be a fifo with > an xattr. Then we can add DT_FIFO to the maybe_whiteout case during > readdir() which will then get picked up as a whiteout without relying > on xino. > > I still have an issue with this approach though. Suppose we have a > rootfs that we want to make available as an overlayfs lower (using > trusted.overlay.*). We can escape all the xattrs in the rootfs with > trusted.overlay.overlay to make them show up correctly. However, > suppose the rootfs contains a char(0.0) whiteout. how do we encode > this? We can use a file with trusted.overlay.overlay.whiteout, which > then will be visible as trusted.overlay.whiteout in the overlayfs > mount. This is normally functionally equivalent to the char(0,0) > whiteout, but not if the user of the rootfs mounts the second > overlayfs using userxattr. > > Maybe it could convert a char(0,0) to a file with both > trusted.overlay.overlay.whiteout and user.overlay.whiteout? The the > resulting overlayfs file could be used as a whiteout for both types, > just like the char node... Ah, no. that won't work because we can't set a user xattr on a fifo. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx