Re: overlayfs: NFS lowerdir changes & opaque negative lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 22, 2024 at 6:28 PM Daire Byrne <daire@xxxxxxxx> wrote:
>
> On Fri, 19 Jul 2024 at 10:19, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > > > >> Basically, I need to be able to continue serving the same files and
> > > > >> paths even while the copy-up metadata process for any part of the tree
> > > > >> is in progress. And it sounds like your idea of considering a copy-up
> > > > >> of a merged dir as "complete" (and essentially opaque) would be the
> > > > >> way to do that without files or dirs ever moving or losing access even
> > > > >> momentarily.
> > > > >
> > > > >
> > > > > Yes, that's the idea.
> > > > >
> > > > > I'll see when I get around to that demo.
> > > >
> > > > I found some time to write the POC patch, but not enough time
> > > > to make it work :) - it is failing some fstests.
> > > >
> > > > Since I don't know when I will have time to debug the issues,
> > > > here is the WIP if you want to debug it and point out the bugs:
> > > >
> > > > https://github.com/amir73il/linux/commits/ovl-finalize-dir/
> > >
> > > This is very cool - many thanks!
> > >
> >
> > Pushed two fixes to this branch - now it passes fstest - please re-test.
>
> I can confirm the root dentry issue is fixed and I now see a lot less
> ENOENTs hitting the lower (NFS) directory. For my test case, we go
> from ~80,000 -> 27,000 ENOENTs, a great improvement.
>
> However I was still seeing some lookups hitting the lower layer in
> directories that had opaque=z. After a bit of testing and much head
> scratching, it seems like you need to do something like "ls
> /olv/blah/thing/version/lib" to force a readdir of the lower NFS
> filesystem (after remount) to reliably stop the lookups hitting the
> lower filesystem.
>
> For example, after doing the steps to copy up metadata and set
> opaque=z, we then remount and test the effect:
>
> chown -R -h bob /olv/blah/thing/version/lib
> ls -lR  /ovl/blah/thing/version/lib
> umount /ovl
> mount /ovl
> for x in {1..10}; do stat /olv/blah/thing/version/lib/${RANDOM};done
>
> This will always send the lookup to the lower level. But if we do:
>
> ls -l /olv/blah/thing/version/lib
> for x in {1..10}; do stat /olv/blah/thing/version/lib/${RANDOM}.so;done
>
> this triggers a readdir of the lower NFS dir and then the random file
> lookups only hit the upper layer as expected.
>
> So then I'm not entirely clear why in my real world test *most* of the
> negative lookups are served by the upper level but some are not. I can
> only think that loading software *mostly* triggers READDIRs on things
> like lib dirs, but not always?
>
> It may also be complicated by the fact that we often have symlinks at
> various points in the path too (both dir & file).
>

No again, it's just a bug.
My fix to initialize __lastmerged for root dentry broke initialization
of __lastmerged
for lookup of non-root dentry and readdir would fixup  __lastmerged later.

Pushed a fix. Again, only sanity tested, so the finalize feature may
still have bugs.

> > ovl will lookup the lower data of a metacopy file at lookup time.
> > Perhaps we could do lazy lookup of data files, like we do with
> > data-only layers, but this would be more complicated and in any
> > case, stat(2) needs the lower data file to present the correct
> > value of st_blocks.
>
> Right... for some reason I thought it only needed st_size which it
> could get from the sparse file.
>
> > IOW, this patch mainly helps to prevent *negative* lookups
> > in the lower layers.
> >
> > > expect no lookups to the lower for negative lookups? Unless we can't
> > > serve negative lookups from the readdir of the upper dir?
> >
> > Correct expectation.
> > The bug was fixed. It should hopefully work now,
> > although I did not test.
> >
> > >
> > > I have probably misunderstood that the "finalized" directories will
> > > only serve the contents of the readdir result and not send metadata
> > > lookups to the lower level (ala dir=opaque). Or my v6.9.3 kernel has
> > > some other issue unrelated to this patch....
> >
> > You understood almost correctly, but you need to understand that
> > the finalized directory is not completely opaque, it is partly opaque.
> > You can visualize this as painting the space "between entries" opaque,
> > so that negative lookup results will stop as well as readdir, but ovl
> > will still look *underneath* entries.
> >
> > BTW, ovl will also lookup in lower dirs if entries inside an upper
> > opaque dir have an absolute path redirect xattr.
> > This type of upper opaque dir is also called "impure" and has an
> > "impure" xattr.
> >
> > So you may say that the difference between an opaque=y
> > dir and opaque=z dir is that in the latter, all entries are treated
> > as if they have an absolute path redirect (to their own path),
> > but that is a very hand wavy way of putting it.
>
> Yea, that makes sense to me.
>
> > Anyway, if you are happy with the patch and want to see it upstreamed,
> > I have few requests:
> >
> > 1. Please provide your Tested-by.
> > 2. Please provide a detailed cover letter that explains the problem (*)
> >     and how the patch fixes it, which includes performance gain numbers.
> >     You may post the cover letter with my patch or ask me to do it
>
> I'll draft up something for review and I'll try to keep it concise.
>
> > 3. Please write an fstest to test the finalized dir feature (**)
> >     You may want to look at test overlay/031 for inspiration
> >     if also makes changes to layers offline and examines the outcome
>
> Bear with me... I'll have a look and see if I can figure it out.
>

I am not in a hurry. quite the contrary.
I have just a bit of time to help :)

> > (*) Please do NOT include the story of changing lower entries
> >     under a mounted overlayfs - they are not relevant to this patch
> >     and they are very controversial.
>
> Understood. However, I do suspect that some are using this "undefined
> behaviour" because it worked for their use case. This patch might ruin
> their day?
>

That's a good question - I don't know.
We can either try to see if someone shouts or add it as an opt-in feature.
I kind of dislike the idea of adding an opt-in option for this behavior,
which is explicitly documented as undefined and could lead to several
other weird issues, so I hope we can get away without opt-in.

Even if we agreed to extend the defined behavior to changes to lower
layers to objects which overlayfs did not yet observe, this definition will
not have included the case where the merged directory was already iterated.

Thanks,
Amir.

> > (**) Basically do chown -R and ls, unmount overlay, add lower entry
> >     re-mount overlay and verify that it is not observed.
> >     Can also sanity check that opaque=z was set.
> >     Note that adding lower entries offline is a behavior that was once allowed
> >     and some users actually expect it to work, so I made the feature
> >     depend on !ovl_allow_offline_changes(ofs).
> >     Therefore, the test should explicitly require and enable metacopy for
> >     overlayfs mount to enable the finalized dir feature.
>
> Many thanks,
>
> Daire





[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux