On Wed, Aug 23, 2023 at 4:13 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > On Tue, Aug 22, 2023 at 5:31 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Tue, Aug 22, 2023 at 5:36 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > On Tue, Aug 22, 2023 at 4:25 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > On Tue, Aug 22, 2023 at 3:56 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, 22 Aug 2023 at 15:22, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Mon, Aug 21, 2023 at 1:00 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu, 17 Aug 2023 at 13:05, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > This is needed to properly stack overlay filesystems, I.E, being able > > > > > > > > to create a whiteout file on an overlay mount and then use that as > > > > > > > > part of the lowerdir in another overlay mount. > > > > > > > > > > > > > > > > The way this works is that we create a regular whiteout, but set the > > > > > > > > `overlay.nowhiteout` xattr on it. Whenever we check if a file is a > > > > > > > > whiteout we check this xattr and don't treat it as a whiteout if it is > > > > > > > > set. The xattr itself is then stripped and when viewed as part of the > > > > > > > > overlayfs mount it looks like a regular whiteout. > > > > > > > > > > > > > > > > > > > > > > I understand the motivation, but don't have good feelings about the > > > > > > > implementation. Like the xattr escaping this should also have the > > > > > > > property that when fed to an old kernel version, it shouldn't > > > > > > > interpret this object as a whiteout. Whether it remains hidden like > > > > > > > the escaped xattrs or if it shows up as something else is > > > > > > > uninteresting. > > > > > > > > > > > > > > It could just be a zero sized regular file with "overlay.whiteout". > > > > > > > > > > > > So, I started doing this, where a whiteout is just a regular file with > > > > > > the xattr set. Initially I thought I only needed to check the xattr > > > > > > during lookup and convert the inode mode from S_IFREG to S_IFCHR. > > > > > > However, I also need to hook up readdir and convert DT_REG to DT_CHR, > > > > > > otherwise readdir will report the wrong d_type. To make it worse, > > > > > > overlayfs itself looks for DT_CHR to handle whiteouts when listing > > > > > > files, so nesting is not working without that. > > > > > > > > > > > > The only way I see to implement that conversion is to call getxattr() > > > > > > on every DT_REG file during readdir(), and while a single getxattr() > > > > > > on lookup is fine, I don't think that is. > > > > > > > > > > > > Any other ideas? > > > > > > > > > > Not messing with d_type seems a good idea. How about a random > > > > > unreserved chardev? > > > > > > > > Only the whiteout one (0,0) can be created by non-root users. > > > > > > I was thinking of (ab)using DT_SOCK or DT_FIFO, but turns out you > > > can't store xattrs on such files. > > > > FWIW, there is also DT_WHT that was defined and never used. > > But that is just an anecdote. > > > > Regarding the issue of avoiding getxattr for every dirent. > > Note that in readdir, dirent that goes through ovl_cache_update_ino() > > calls lookup()/stat() on the overlay itself, so as long as ovl_lookup() > > will treat overlay.whiteout file as a whiteout, the code > > /* Mark a stale entry */ > > p->is_whiteout = true; > > will kick in and do the right thing for readdir wrt cleaning up > > lower entries covered with whiteouts, regardless of DT_CHR. > > > > Now all that is left is to make sure that ovl_cache_update_ino() > > is called if there are possibly overlay.whiteout files. > > > > For the case of nested ovl, upper and lower fs cannot be the same, > > so ovl_same_fs() is false. > > Therefore, as long as xino is enabled (this is the default), > > ovl_same_dev() is true => ovl_xino_bits() > 0 => > > ovl_calc_d_ino() is true and ovl_cache_update_ino() will be > > called for all merged dirents. > > > > IOW, unless I am missing something, if you implement overlay.whiteout > > logic in ovl_lookup() correctly, readdir should "just work" as long as > > the mounter does not explicitly use -o xino=off. > > Things are not that rosy for this. > > First of all, the default value for OVERLAY_FS_XINO_AUTO is no, so by > default xino is not enabled. This means that overlay.whiteout only > works if you enable xino=on/auto in the mount. > > Secondly, in the case where all upper and lower are on the same fs, > even if xino is enabled it is ignored. This is not the case where the > lower is on a nested overlayfs as you say, but maybe we want to > create a lower dir that works on both when stored on a overlayfs and > elsewere. Relying on xino which is essentially unrelated to whiteouts > to get enabled seems quite fragile. > All right. But you do not have to rely on xino && !ovl_same_fs(). What you need is for ovl_calc_d_ino() to be true. For example, consider this (edited) condition in ovl_calc_d_ino(): /* * Recalc d_ino for all entries if dir is impure (contains * copied up entries) */ if (ovl_test_flag(OVL_IMPURE, d_inode(rdd->dentry))) return true; It decided to lookup the entries based on a property of the parent dir. Specifically, this check will always be true for merge dirs that have an upper dir with non-zero copied-up files. But you can use the some principle to "taint" directories that contain "xattr_whiteouts" in a similar manner to "impurity" and that would be enough for ovl_calc_d_ino() to do the right thing also for merge dirs with no upper dir. If we do this, then both overlay.whiteout and overlay.xattr_whiteouts xattrs will be xattrs that the overlayfs driver never sets. It's a precedent, but as long as it is properly documented and encoded in fstests, I will be fine with it. Not sure about Miklos. Are you ok with leaving the responsibility to mark the directories as overlay.xattr_whiteouts to mkfs.composefs? Does this solution work for you? Thanks, Amir.