Re: overlayfs: NFS lowerdir changes & opaque negative lookups

Mike Baynton <mike@xxxxxxxxxxxx> · Sat, 13 Jul 2024 23:12:52 -0500

On 7/12/24 07:04, Daire Byrne wrote:
> Yea, so I have also toyed with the "composefs" idea

Yeah, I'm doing what they're doing but making the EROFS in-house and
hoping the kinda-writable NFS twist isn't an issue. I only need to
satisfy dependencies for a container's worth of software at a time and I
can determine all the dependencies I need by virtue of tooling in the
software ecosystems I need to support.

> I guess the difference is that I'm not trying to replicate the
> entirety of the metadata, I just want to tweak bits of it and still
> avail of the overlay merged directories to fall through to the
> directory tree and data underneath for everything else.

Yeah I understand your objective now. I'm mildly curious why NFS +
fscache doesn't solve the negative lookups case for you given that you
want a dynamically generated local cache. Is fscache just unable to
cache negative lookups, and you want it to persist for weeks?

Also (only semi-related) since you have a large NFS deployment similar
to the one I'm putting together in terms of read-only to normal clients
and most files/paths being immutable after they first appear, I'd be
interested in any experiences you've had in practice with performance of
fscache and NFS mount options that relax its cache coherence / atomicity
semantics. I've found it impossible to avoid roundtrips to the server on
each fopen for locally cached files (unless using NFS4 delegation which
is overkill and not available in my environment.) These RPC roundtrips
provide no real benefit to our use case but can add seconds of delay to
initializing a process if it accesses thousands of little interpreted
language files.

Not an overlayfs concern in any way though so perhaps no need to pollute
the mailing list further; if you are interested in responding to me on
these things continuing off list would be fine with me too.

> 
>>> how can we document it to make the behavior "defined"?
>>>
>>> My thinking is:
>>>
>>> "Changes to the underlying filesystems while part of a mounted overlay
>>> filesystem are not allowed.  If the underlying filesystem is changed,
>>> the behavior of the overlay is undefined, though it will not result in
>>> a crash or deadlock.
>>>
>>> One exception to this rule is changes to underlying filesystem objects
>>> that were not accessed by a overlayfs prior to the change.
>>> In other words, once accessed from a mounted overlay filesystem,
>>> changes to the underlying filesystem objects are not allowed."
>>>
>>> But this claim needs to be proved and tested (write tests),
>>> before the documentation defines this behavior.
>>> I am not even sure if the claim is correct.
>>
>> I've been blissfully and naively assuming that it is based on intuition
>> :).
>>
>> I think Daire and I are basically only adding new files to the NFS
>> filesystem, and both the all-opaque approach and the data-only approach
>> could prevent accidental access to things on the NFS filesystem through
>> the overlayfs (or at least portion of it meant for end-user consumption)
>> while they are still being birthed and might be experiencing changes.
>> At some point in the NFS tree, directories must be modified, but since
>> both approaches have overlayfs sourcing all directory entries from local
>> metadata-only layers, it seems plausible that the directories that
>> change aren't really "accessed by a overlayfs prior to the change."
> 
> Yes, I think your case has a good chance of being safe and becoming
> well defined behaviour.
> 
> But my idea was still very much relying on using the majority of the
> lower layer as is. And for all the reasons given, I suspect my use
> case is still a no-no.

I dunno, your thing might end up working out fine, based on your latest
testing of when clients see changes and Amir's observation that all fds
need to be closed but then a readdir through an overlayfs will observe
changes. Seems "unlikely" that clients would hold open fds to the first
few levels of directories at all, never mind for long enough for someone
to call you and ask where the new version is :)

Mike

> 
>> How much proving/testing would you want to see before documenting this
>> and supporting someone in future who finds a way to prove the claim
>> wrong?
>>
>>>
>>> One more thing that could help said service is if overlayfs
>>> supported a hybrid mode of redirect_dir=follow,metacopy=on,
>>> where redirect is enabled for regular files for metacopy, but NOT
>>> enabled for directories (which was redirect_dir original use case).
>>>
>>> This way, the service could run the command line:
>>> $ mv /ovl/blah/thing /ovl/local
>>> then "mv" will get EXDEV for moving directories and will create
>>> opaque directories in their place and it will recursively move all
>>> the files to the opaque directories.
>>
>> Clever.
>>
>> Thanks,
>> Mike
> 
> Thanks for the support! Certainly creating metadata only layers with
> data layers is something I have considered. But for many of the same
> reasons that we cannot compress all our PATHs to a single directory
> full of symlinks, I'm not sure I will be able to construct a concise
> metadata only layer without a much deeper understanding of how our
> devs are building and deploying software. And I'm not sure I have the
> mental fortitude for that journey :)
> 
> Daire