On Thu, Aug 13, 2020 at 8:22 PM Kevin Locke <kevin@xxxxxxxxxxxxxxx> wrote: > > Thanks again Amir! I'll work on patches for the docs and adding > pr_warn_ratelimited() for invalid metacopy/redirect as soon as I get a > chance. > > On Wed, 2020-08-12 at 20:06 +0300, Amir Goldstein wrote: > > On Wed, Aug 12, 2020 at 7:05 PM Kevin Locke <kevin@xxxxxxxxxxxxxxx> wrote: > >> On Wed, 2020-08-12 at 18:21 +0300, Amir Goldstein wrote: > >>> I guess the only thing we could document is that changes to underlying > >>> layers with metacopy and redirects have undefined results. > >>> Vivek was a proponent of making the statements about outcome of > >>> changes to underlying layers sound more harsh. > >> > >> That sounds good to me. My current use case involves offline changes to > >> the lower layer on a routine basis, and I interpreted the current > > > > You are not the only one, I hear of many users that do that, but nobody ever > > bothered to sit down and document the requirements - what exactly is the > > use case and what is the expected outcome. > > I can elaborate a bit. Keep in mind that it's a personal use case which > is flexible, so it's probably not worth supporting specifically, but may > be useful to discuss/consider: > > A few machines that I manage are dual-boot between Windows and Linux, > with software that runs on both OSes (Steam). This software installs a > lot (>100GB) of semi-static data which is mostly (>90%) the same between > OSes, but not partitioned by folder or designed to be shared between > them. The software includes mechanisms for validating the data files > and automatically updating/repairing any files which do not match > expectations. > > I currently mount an overlayfs of the Windows data directory on the > Linux data directory to avoid storing multiple copies of common data. > After any data changes in Windows, I re-run the data file validation in > Linux to ensure the data is consistent. I also occasionally run a > deduplication script[1] to remove files which may have been updated on > Linux and later updated to the same contents on Windows. > Nice use case. It may be a niche use case the way to describe it, but the general concept of "updatable software" at the lower layer is not unique to your use case. See this [1] recent example that spawned the thread about updating the documentation w.r.t changing underlying layers. [1] https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/ > To support this use, I'm looking for a way to configure overlayfs such > that offline changes to the lower dir do not break things in a way that > can't be recovered by naive file content validation. Beyond that, any > performance-enhancing and space-saving features are great. > > metacopy and redirection would be nice to have, but are not essential as > the program does not frequently move data files or modify their > metadata. That's what I figured. > If accessing an invalid metacopy behaved like a 0-length > file, it would be ideal for my use case (since it would be deleted and > re-created by file validation) but I can understand why this would be > undesirable for other cases and problematic to implement. (I'm I wouldn't say it is "problematic" to implement. It is simple to convert the EIO to warning (with opt-in option). What would be a challenge to implement is the behavior, where metadata access is allowed for broken metacopy, but data access results in EIO. > experimenting with seccomp to prevent/ignore metadata changes, since the > program should run on filesystems which do not support them. An option > to ignore/reject metadata changes would be handy, but may not be > justified.) > > Does that explain? Does it seem reasonable? Is disabling metacopy and > redirect_dir likely to be sufficient? Yes, disabling metacopy and redirect_dir sounds like the right thing to do, because I don't think they gain you too much anyway. > > Best, > Kevin > > [1]: Do you know of any overlayfs-aware deduplication programs? If not, > I may consider cleaning up and publishing mine at some point. I know about overlayfs-tools's "merge" command. I do not know if anyone is using this tool besides perhaps it's author (?). Incidentally, I recently implemented the "deref" command for overlayfs-tools [2] which unfolds metacopy and redirect_dir and creates an upper layer without them. The resulting layer can then be deduped with lower layer using the "merge" command. [2] https://github.com/kmxz/overlayfs-tools/pull/11 I also implemented (in the same pull request) awareness of overlayfs-tools to metacopy and redirect_dir with existing commands. "merge" command simply aborts when they are encountered, but "vacuum" and "diff" commands work correctly. I also added the "overlay diff -b" variant, which creates an output equivalent to that of the standard diff tool (diffutils) just by analyzing the layers. Thanks, Amir.