On Mon, Jul 15, 2024 at 9:14 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Mon, Jul 15, 2024, 6:36 PM Daire Byrne <daire@xxxxxxxx> wrote: >> >> On Mon, 15 Jul 2024 at 15:15, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >> > >> > > > I understand. >> > > > It makes sense. >> > > > >> > > > I remember tossing the idea of "finalizing" the merged dir copy up - >> > > > meaning that at the end of ovl_dir_read_merged(), overlayfs knows >> > > > if the upper entries shadow all the lower entries, and in this case, the >> > > > lower layers NEVER need to be iterated again, so some xattr could >> > > > be set on the upper dir to indicate that the copy up on the dir content >> > > > has been completed. >> > > > >> > > > After the copy up of dir content has been completed, then ovl_lookup() >> > > > should not continue to lookup children of this merged dir in lower layers >> > > > unless it was redirected by upper layer. >> > > > >> > > > It is not a trivial change, but I think it can be beneficial. >> > > > >> > > > The good thing about this is that there is no need for a new API - >> > > > all your service would need to do is chown -R as you tried to do and >> > > > it will "just work" - no more unneeded lookups in NFS layer. >> > > >> > > Well, that is an interesting idea. I'm not sure how you would >> > > determine that a merged dir has been "completely" copied up (comparing >> > > readdir results?). >> > >> > overlay readdir of merged dir NEEDS to merge lower entries >> > that DO NOT exist in the upper layer - if there are not such entries >> > found, looking in the lower layer next time is futile. >> > >> > > And how would this differ to setting the "opaque" >> > > xattr on the dir (but automatically)? >> > >> > The lower layer still has information that overlayfs needs, >> > and ovetrlayfs needs to be able to follow redirects into lower layer. >> > This is not going to work with an opaque upper dir. >> >> I guess as long as the upperdir can now serve all the lookups and >> negative lookups for a given directory (and optionally entire >> subsequent directory tree) without needing to consult with the lower >> directory specifically for them, that's all I care about :) >> >> > > Would it need a new xattr? >> > > >> > >> > Maybe, or use the combination of "opaque" + "redirect" to >> > describe this hybrid type of directory (the dir content was fully >> > copied up, but redirects may still follow to lower entries. >> > Essentially, this is equivalent to a lower-most directory (implicitly >> > opaque dir) that can follow redirects into a data-only layer. >> > >> > > It also means that all subsequent dirs in the lower tree would also be >> > > "opaque" even if they have not been checked for copy-up completeness? >> > >> > No. A directory inode is a sort of a file whose "data" is the dir content. >> > "copy-up completeness" means the list of entries have been copied up >> > (not recursively). >> > >> > > Or they would get a redirect until it could be determined they were >> > > completely copied up? >> > >> > readdir operated on a single dir inode. >> > readdir of a directory can end up making it "half-opaque" >> > nothing recursive about it - application can do this recursively >> > as it wishes. >> > >> > > >> > > I also won't pretend to understand how you could do that for a >> > > recursive copy up without momentarily disrupting access. Like if you >> > > did a recursive copy up and the top level dirs complete first while >> > > the lower contents haven't been totally copied up yet? >> > >> > Not doing anything recursive. >> >> I guess what I meant by recursive was the proposed "chown -R" that >> would "promote" the metadata to the upper layer recursively. >> >> I think you answered my question by saying that both files & >> directories in a "complete" copy-up directory would still get a >> redirect so it wouldn't break access while the chown was running? Once >> it gets to the next level, the new xatrr (or opaque + redirect) would >> then be added to those directories etc etc. all the way down. > > > Yap. > >> >> > > >> > > It sounds complex :) >> > >> > Not really. The patch is not trivial, but the concept is simple. >> > If I find a few hours, I will post a demo. >> >> That would be cool! Always happy to test patches. >> >> > > > > > One more thing that could help said service is if overlayfs >> > > > > > supported a hybrid mode of redirect_dir=follow,metacopy=on, >> > > > > > where redirect is enabled for regular files for metacopy, but NOT >> > > > > > enabled for directories (which was redirect_dir original use case). >> > > > > > >> > > > > > This way, the service could run the command line: >> > > > > > $ mv /ovl/blah/thing /ovl/local >> > > > > > then "mv" will get EXDEV for moving directories and will create >> > > > > > opaque directories in their place and it will recursively move all >> > > > > > the files to the opaque directories. >> > > > > >> > > > > Okay, I think I see what you are getting at but I need to test the >> > > > > patch to make sure :) >> > > >> > > Sorry, I will try and test the patch this week as I am actually >> > > curious about using it to create offline handcrafted overlay trees >> > > too. So rather than run a combination of truncate, touch, chown, >> > > chmod, setfattr commands, mount an overlay with your patch, move the >> > > dirs around, umount and then use the resulting metadata overlay as a >> > > read-only overlay from then on. >> > > >> > >> > That sounds much better than mangling with overlayfs xattrs. >> > >> > > I'm still toying with the idea of creating one (enormous) read-only >> > > overlay with all the lib/plugin directories as opaque directories and >> > > just accepting that I might only refresh it once a day and clients >> > > might only remount it once a week... Not great, but some amount of >> > > local lookup acceleration is better than none. >> > > >> > > I think the main problem with using this patch for my use case is that >> > > as soon as you do the mv, you break any processes that might be >> > > scanning those dirs at that instant or any new ones that start up. It >> > > may be possible to have my userspace daemon choose the right time to >> > > run the mv, but it's hard to predict how fast it would take to >> > > complete. >> > > >> > >> > Confused. I thought you were going to use the patch for offline preparation >> > of metacopy layers. >> >> Sorry, I did mean only for the case where I might create the desired >> upper layer for reuse later on (ie offline changes), your patch sounds >> like a really useful and optimised time saver compared to my >> hand-crafted method. I am still considering the offline method if >> there proves to be no other alternative. >> >> But for the case where I would want a seamless online way to achieve >> the same upper layer opaque directories, then obviously moving >> directory trees even momentarily out of position and back again would >> likely break software just starting up in that moment. >> >> And coordinating a background daemon that does the mv, with users who >> randomly start applications sounds like a difficult problem. >> >> > Note that once you did mv into an opaque tree, >> > you can move the opaque dir back into its original location >> > (e.g. /blah/think/UUID...) and the dir will remain opaque, >> > because EXDEV is only generated when trying to move >> > merged dirs. >> > Moving opaque upper dirs around is allowed and should work. >> >> Yes exactly, this would likely work most of the time while online >> except when some software is expecting the files to always be located >> in an immutable path location and the mv is in progress? Unless I am >> totally misunderstanding (always a strong possibility). > > > You understood correctly. > This method is not suitable for online promotion. > >> >> Basically, I need to be able to continue serving the same files and >> paths even while the copy-up metadata process for any part of the tree >> is in progress. And it sounds like your idea of considering a copy-up >> of a merged dir as "complete" (and essentially opaque) would be the >> way to do that without files or dirs ever moving or losing access even >> momentarily. > > > Yes, that's the idea. > > I'll see when I get around to that demo. I found some time to write the POC patch, but not enough time to make it work :) - it is failing some fstests. Since I don't know when I will have time to debug the issues, here is the WIP if you want to debug it and point out the bugs: https://github.com/amir73il/linux/commits/ovl-finalize-dir/ Thanks, Amir.