Re: overlayfs: NFS lowerdir changes & opaque negative lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 11, 2024 at 6:59 PM Daire Byrne <daire@xxxxxxxx> wrote:
>
> Hi,
>
> Apologies for what I assume is another frequent (and long) "changes
> outside of overlayfs" query, but I *think* I have a slightly unique
> use case and so just wanted to ask some experts about the implications
> of the "undefined behaviour" that the documentation (rightly) warns
> against.
>
> Basically I have a read-only NFS filesystem with software releases
> that are versioned such that no files are ever overwritten or changed.
> New uniquely named directory trees and files are added from time to
> time and older ones are cleaned up.
>

Sounds like a common use case that many people are interested in.

> I was toying with the idea of putting a metadata only overlay on top
> of this NFS filesystem (which can change underneath but only with new
> and uniquely named directories and files), and then using a userspace
> metadata copy-up to "localise" directories such that all lookups hit
> the overlay, but file data is still served from the lower NFS server.
> The file data in the upper layer and lower layer never actually
> diverge and so the upper layer is more of a one time permanent
> (metadata) "cache" of the lower NFS layer.
>
> So something like "chown bob -R -h /blah/thing/UIIDA/versionXX/lib" to
> copy-up metadata only. No subsequent changes will ever be made to
> /blah/thing/UIIDA/versionXX/lib on the lower filesystem (other than it
> being deleted). Now, at some point, a new directory
> /blah/thing/UIIDB/versionYY/lib might appear on the lower NFS
> filesystem that has not yet got any upper directory files other than
> perhaps sharing part of the directory path - /blah/thing.
>
> Now this *seems* to work in very basic testing and I have also read
> the previous related discussion and patch here:
>
> https://lore.kernel.org/all/CAOQ4uxiBmFdcueorKV7zwPLCDq4DE+H8x=8H1f7+3v3zysW9qA@xxxxxxxxxxxxxx
>
> My first question is how bad can the "undefined behaviour" be in this
> kind of setup?

The behavior is "undefined" because nobody tried to define it,
document it and test it.
I don't think it would be that "bad", but it will be unpredictable
and is not very nice for a software product.

One of the current problems is that overlayfs uses readdir cache
the readdir cache is not auto invalidated when lower dir changes
so whether or not new subdirs are observed in overlay depends
on whether the merged overlay directory is kept in cache or not.

> Any files that get copied up to the upper layer are
> guaranteed to never change in the lower NFS filesystem (by it's
> design), but new directories and files that have not yet been copied
> up, can randomly appear over time. Deletions are not so important
> because if it has been deleted in the lower level, then the upper
> level copy failing has similar results (but we should cleanup the
> upper layer too).
>
> If it's possible to get over this first difficult hurdle, then I have
> another extra bit of complexity to throw on top - now manually make an
> entire directory tree (of metdata) that we have recursively copied up
> "opaque" in the upper layer (currently needs to be done outside of
> overlayfs). Over time or dropping of caches, I have found that this
> (seamlessly?) takes effect for new lookups.
>
> I also noticed that in the current implementation, this "opaque"
> transition actual breaks access to the file because the metadata
> copy-up sets "trusted.overlay.metacopy" but does not currently add an
> explicit "trusted.overlay.redirect" to the correspnding lower layer
> file. But if it did (or we do it manually with setfattr), then it is
> possible to have an upper level directory that is opaque, contains
> file metadata only and redirects to the data to the real files on the
> lower NFS filesystem.
>
> Why the hell would you want to do this? Well, for the case where you
> are distributing software to many machines, having it on a shared NFS
> filesystem is convenient and reasonably atomic. But when you have
> sofware with many many PATHs (LD_LIBRARY, PYTHON, etc), you can create
> some pretty impressive negative lookups across all those NFS hosted
> directories that can overhelm a single NFS storage server at scale. By
> "caching" or localising the entire PATH directory metadata locally on
> each host, we can serve those negative lookups from local opaque
> directories without traversing the network.
>
> I think this is a common enough software distribution problem in large
> systems and there are already many different solutions to work around
> it. Most involve localising the software on demand from a central
> repository.
>
> Well, I just wondered if it could ever be done using an overlay in the
> way I describe? But at the moment, it has to deal with a sporaidcally
> changing lower filesystem and a manually hand crafted upper
> filesystem. While I think this might all work fine if the filesystems
> can be mounted and unmounted between software runs, it would be even
> better if it could safely be done "online".

How about this for a workaround:


[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux