On Tue, Aug 22, 2023 at 5:31 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Tue, Aug 22, 2023 at 5:36 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > On Tue, Aug 22, 2023 at 4:25 PM Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > On Tue, Aug 22, 2023 at 3:56 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > On Tue, 22 Aug 2023 at 15:22, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > On Mon, Aug 21, 2023 at 1:00 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Thu, 17 Aug 2023 at 13:05, Alexander Larsson <alexl@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > This is needed to properly stack overlay filesystems, I.E, being able > > > > > > > to create a whiteout file on an overlay mount and then use that as > > > > > > > part of the lowerdir in another overlay mount. > > > > > > > > > > > > > > The way this works is that we create a regular whiteout, but set the > > > > > > > `overlay.nowhiteout` xattr on it. Whenever we check if a file is a > > > > > > > whiteout we check this xattr and don't treat it as a whiteout if it is > > > > > > > set. The xattr itself is then stripped and when viewed as part of the > > > > > > > overlayfs mount it looks like a regular whiteout. > > > > > > > > > > > > > > > > > > > I understand the motivation, but don't have good feelings about the > > > > > > implementation. Like the xattr escaping this should also have the > > > > > > property that when fed to an old kernel version, it shouldn't > > > > > > interpret this object as a whiteout. Whether it remains hidden like > > > > > > the escaped xattrs or if it shows up as something else is > > > > > > uninteresting. > > > > > > > > > > > > It could just be a zero sized regular file with "overlay.whiteout". > > > > > > > > > > So, I started doing this, where a whiteout is just a regular file with > > > > > the xattr set. Initially I thought I only needed to check the xattr > > > > > during lookup and convert the inode mode from S_IFREG to S_IFCHR. > > > > > However, I also need to hook up readdir and convert DT_REG to DT_CHR, > > > > > otherwise readdir will report the wrong d_type. To make it worse, > > > > > overlayfs itself looks for DT_CHR to handle whiteouts when listing > > > > > files, so nesting is not working without that. > > > > > > > > > > The only way I see to implement that conversion is to call getxattr() > > > > > on every DT_REG file during readdir(), and while a single getxattr() > > > > > on lookup is fine, I don't think that is. > > > > > > > > > > Any other ideas? > > > > > > > > Not messing with d_type seems a good idea. How about a random > > > > unreserved chardev? > > > > > > Only the whiteout one (0,0) can be created by non-root users. > > > > I was thinking of (ab)using DT_SOCK or DT_FIFO, but turns out you > > can't store xattrs on such files. > > FWIW, there is also DT_WHT that was defined and never used. > But that is just an anecdote. > > Regarding the issue of avoiding getxattr for every dirent. > Note that in readdir, dirent that goes through ovl_cache_update_ino() > calls lookup()/stat() on the overlay itself, so as long as ovl_lookup() > will treat overlay.whiteout file as a whiteout, the code > /* Mark a stale entry */ > p->is_whiteout = true; > will kick in and do the right thing for readdir wrt cleaning up > lower entries covered with whiteouts, regardless of DT_CHR. > > Now all that is left is to make sure that ovl_cache_update_ino() > is called if there are possibly overlay.whiteout files. > > For the case of nested ovl, upper and lower fs cannot be the same, > so ovl_same_fs() is false. > Therefore, as long as xino is enabled (this is the default), > ovl_same_dev() is true => ovl_xino_bits() > 0 => > ovl_calc_d_ino() is true and ovl_cache_update_ino() will be > called for all merged dirents. > > IOW, unless I am missing something, if you implement overlay.whiteout > logic in ovl_lookup() correctly, readdir should "just work" as long as > the mounter does not explicitly use -o xino=off. Things are not that rosy for this. First of all, the default value for OVERLAY_FS_XINO_AUTO is no, so by default xino is not enabled. This means that overlay.whiteout only works if you enable xino=on/auto in the mount. Secondly, in the case where all upper and lower are on the same fs, even if xino is enabled it is ignored. This is not the case where the lower is on a nested overlayfs as you say, but maybe we want to create a lower dir that works on both when stored on a overlayfs and elsewere. Relying on xino which is essentially unrelated to whiteouts to get enabled seems quite fragile. I'm gonna instead try to make the alternative whiteout be a fifo with an xattr. Then we can add DT_FIFO to the maybe_whiteout case during readdir() which will then get picked up as a whiteout without relying on xino. I still have an issue with this approach though. Suppose we have a rootfs that we want to make available as an overlayfs lower (using trusted.overlay.*). We can escape all the xattrs in the rootfs with trusted.overlay.overlay to make them show up correctly. However, suppose the rootfs contains a char(0.0) whiteout. how do we encode this? We can use a file with trusted.overlay.overlay.whiteout, which then will be visible as trusted.overlay.whiteout in the overlayfs mount. This is normally functionally equivalent to the char(0,0) whiteout, but not if the user of the rootfs mounts the second overlayfs using userxattr. Maybe it could convert a char(0,0) to a file with both trusted.overlay.overlay.whiteout and user.overlay.whiteout? The the resulting overlayfs file could be used as a whiteout for both types, just like the char node... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl@xxxxxxxxxx alexander.larsson@xxxxxxxxx