Re: overlayfs: NFS lowerdir changes & opaque negative lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 17 Jul 2024 at 19:15, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Mon, Jul 15, 2024 at 9:14 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> >
> >
> > On Mon, Jul 15, 2024, 6:36 PM Daire Byrne <daire@xxxxxxxx> wrote:
> >>
> >> On Mon, 15 Jul 2024 at 15:15, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >> >
> >> > > > I understand.
> >> > > > It makes sense.
> >> > > >
> >> > > > I remember tossing the idea of "finalizing" the merged dir copy up -
> >> > > > meaning that at the end of ovl_dir_read_merged(), overlayfs knows
> >> > > > if the upper entries shadow all the lower entries, and in this case, the
> >> > > > lower layers NEVER need to be iterated again, so some xattr could
> >> > > > be set on the upper dir to indicate that the copy up on the dir content
> >> > > > has been completed.
> >> > > >
> >> > > > After the copy up of dir content has been completed, then ovl_lookup()
> >> > > > should not continue to lookup children of this merged dir in lower layers
> >> > > > unless it was redirected by upper layer.
> >> > > >
> >> > > > It is not a trivial change, but I think it can be beneficial.
> >> > > >
> >> > > > The good thing about this is that there is no need for a new API -
> >> > > > all your service would need to do is chown -R as you tried to do and
> >> > > > it will "just work" - no more unneeded lookups in NFS layer.
> >> > >
> >> > > Well, that is an interesting idea. I'm not sure how you would
> >> > > determine that a merged dir has been "completely" copied up (comparing
> >> > > readdir results?).
> >> >
> >> > overlay readdir of merged dir NEEDS to merge lower entries
> >> > that DO NOT exist in the upper layer - if there are not such entries
> >> > found, looking in the lower layer next time is futile.
> >> >
> >> > > And how would this differ to setting the "opaque"
> >> > > xattr on the dir (but automatically)?
> >> >
> >> > The lower layer still has information that overlayfs needs,
> >> > and ovetrlayfs needs to be able to follow redirects into lower layer.
> >> > This is not going to work with an opaque upper dir.
> >>
> >> I guess as long as the upperdir can now serve all the lookups and
> >> negative lookups for a given directory (and optionally entire
> >> subsequent directory tree) without needing to consult with the lower
> >> directory specifically for them, that's all I care about :)
> >>
> >> > > Would it need a new xattr?
> >> > >
> >> >
> >> > Maybe, or use the combination of "opaque" + "redirect" to
> >> > describe this hybrid type of directory (the dir content was fully
> >> > copied up, but redirects may still follow to lower entries.
> >> > Essentially, this is equivalent to a lower-most directory (implicitly
> >> > opaque dir) that can follow redirects into a data-only layer.
> >> >
> >> > > It also means that all subsequent dirs in the lower tree would also be
> >> > > "opaque" even if they have not been checked for copy-up completeness?
> >> >
> >> > No. A directory inode is a sort of a file whose "data" is the dir content.
> >> > "copy-up completeness" means the list of entries have been copied up
> >> > (not recursively).
> >> >
> >> > > Or they would get a redirect until it could be determined they were
> >> > > completely copied up?
> >> >
> >> > readdir operated on a single dir inode.
> >> > readdir of a directory can end up making it "half-opaque"
> >> > nothing recursive about it - application can do this recursively
> >> > as it wishes.
> >> >
> >> > >
> >> > > I also won't pretend to understand how you could do that for a
> >> > > recursive copy up without momentarily disrupting access. Like if you
> >> > > did a recursive copy up and the top level dirs complete first while
> >> > > the lower contents haven't been totally copied up yet?
> >> >
> >> > Not doing anything recursive.
> >>
> >> I guess what I meant by recursive was the proposed "chown -R" that
> >> would "promote" the metadata to the upper layer recursively.
> >>
> >> I think you answered my question by saying that both files &
> >> directories in a "complete" copy-up directory would still get a
> >> redirect so it wouldn't break access while the chown was running? Once
> >> it gets to the next level, the new xatrr (or opaque + redirect) would
> >> then be added to those directories etc etc. all the way down.
> >
> >
> > Yap.
> >
> >>
> >> > >
> >> > > It sounds complex :)
> >> >
> >> > Not really. The patch is not trivial, but the concept is simple.
> >> > If I find a few hours, I will post a demo.
> >>
> >> That would be cool! Always happy to test patches.
> >>
> >> > > > > > One more thing that could help said service is if overlayfs
> >> > > > > > supported a hybrid mode of redirect_dir=follow,metacopy=on,
> >> > > > > > where redirect is enabled for regular files for metacopy, but NOT
> >> > > > > > enabled for directories (which was redirect_dir original use case).
> >> > > > > >
> >> > > > > > This way, the service could run the command line:
> >> > > > > > $ mv /ovl/blah/thing /ovl/local
> >> > > > > > then "mv" will get EXDEV for moving directories and will create
> >> > > > > > opaque directories in their place and it will recursively move all
> >> > > > > > the files to the opaque directories.
> >> > > > >
> >> > > > > Okay, I think I see what you are getting at but I need to test the
> >> > > > > patch to make sure :)
> >> > >
> >> > > Sorry, I will try and test the patch this week as I am actually
> >> > > curious about using it to create offline handcrafted overlay trees
> >> > > too. So rather than run a combination of truncate, touch, chown,
> >> > > chmod, setfattr commands, mount an overlay with your patch, move the
> >> > > dirs around, umount and then use the resulting metadata overlay as a
> >> > > read-only overlay from then on.
> >> > >
> >> >
> >> > That sounds much better than mangling with overlayfs xattrs.
> >> >
> >> > > I'm still toying with the idea of creating one (enormous) read-only
> >> > > overlay with all the lib/plugin directories as opaque directories and
> >> > > just accepting that I might only refresh it once a day and clients
> >> > > might only remount it once a week... Not great, but some amount of
> >> > > local lookup acceleration is better than none.
> >> > >
> >> > > I think the main problem with using this patch for my use case is that
> >> > > as soon as you do the mv, you break any processes that might be
> >> > > scanning those dirs at that instant or any new ones that start up. It
> >> > > may be possible to have my userspace daemon choose the right time to
> >> > > run the mv, but it's hard to predict how fast it would take to
> >> > > complete.
> >> > >
> >> >
> >> > Confused. I thought you were going to use the patch for offline preparation
> >> > of metacopy layers.
> >>
> >> Sorry, I did mean only for the case where I might create the desired
> >> upper layer for reuse later on (ie offline changes), your patch sounds
> >> like a really useful and optimised time saver compared to my
> >> hand-crafted method. I am still considering the offline method if
> >> there proves to be no other alternative.
> >>
> >> But for the case where I would want a seamless online way to achieve
> >> the same upper layer opaque directories, then obviously moving
> >> directory trees even momentarily out of position and back again would
> >> likely break software just starting up in that moment.
> >>
> >> And coordinating a background daemon that does the mv, with users who
> >> randomly start applications sounds like a difficult problem.
> >>
> >> > Note that once you did mv into an opaque tree,
> >> > you can move the opaque dir back into its original location
> >> > (e.g. /blah/think/UUID...) and the dir will remain opaque,
> >> > because EXDEV is only generated when trying to move
> >> > merged dirs.
> >> > Moving opaque upper dirs around is allowed and should work.
> >>
> >> Yes exactly, this would likely work most of the time while online
> >> except when some software is expecting the files to always be located
> >> in an immutable path location and the mv is in progress? Unless I am
> >> totally misunderstanding (always a strong possibility).
> >
> >
> > You understood correctly.
> > This method is not suitable for online promotion.
> >
> >>
> >> Basically, I need to be able to continue serving the same files and
> >> paths even while the copy-up metadata process for any part of the tree
> >> is in progress. And it sounds like your idea of considering a copy-up
> >> of a merged dir as "complete" (and essentially opaque) would be the
> >> way to do that without files or dirs ever moving or losing access even
> >> momentarily.
> >
> >
> > Yes, that's the idea.
> >
> > I'll see when I get around to that demo.
>
> I found some time to write the POC patch, but not enough time
> to make it work :) - it is failing some fstests.
>
> Since I don't know when I will have time to debug the issues,
> here is the WIP if you want to debug it and point out the bugs:
>
> https://github.com/amir73il/linux/commits/ovl-finalize-dir/

This is very cool - many thanks!

Unfortunately, I'm probably not the right person to code and identify
actual fixes, but I can test and describe results pretty well. :)

So I applied the patch (cleanly) to v6.9.3 (because I had it handy)
and mounted with "metadata=on". The first oddity is that the root ovl
directory shows no results for "ls /ovl" (there are lots of dirs in
the lower layer)
but if I do the same to a directory I know exists, it appears and
returns results just fine (e.g. ls /ovl/thing/blah). Then if I "ls
/ovl" again I see just /ovl/thing but none of the other dirs (until
also accessed by path).

Anyway, that doesn't really block further testing as the software I
load does not need to walk or interrogate the entries. So then I did a
"chown -h -R bob /blah/thing/stuff/version" and looked at the xattrs
of the upper - all the (metadata) files and dirs were brought up with
files having a redirect, but the dirs that should have
trusted.overlay.opaque=z did not at this stage. Another followup "ls
-lR  /blah/thing/stuff/version" and now I can see the
trusted.overlay.opaque=z where I would expect it to be.

But now when I lookup random NOENT files in those directories, I can
still see the lookup going across the network to the lower filesystem?
It looks like it's the same for the positive lookups - doing a stat
against a file that I know is in a trusted.overlay.opaque=z directory
still sends the lookup over NFS (which it does not if the directory is
opaque=y).

I mean, I expect a lookup for an existing file with a metadata
redirect to it for reads but not metadata stat() lookups? Also I would
expect no lookups to the lower for negative lookups? Unless we can't
serve negative lookups from the readdir of the upper dir?

I have probably misunderstood that the "finalized" directories will
only serve the contents of the readdir result and not send metadata
lookups to the lower level (ala dir=opaque). Or my v6.9.3 kernel has
some other issue unrelated to this patch....

Daire





[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux