overlayfs: NFS lowerdir changes & opaque negative lookups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Apologies for what I assume is another frequent (and long) "changes
outside of overlayfs" query, but I *think* I have a slightly unique
use case and so just wanted to ask some experts about the implications
of the "undefined behaviour" that the documentation (rightly) warns
against.

Basically I have a read-only NFS filesystem with software releases
that are versioned such that no files are ever overwritten or changed.
New uniquely named directory trees and files are added from time to
time and older ones are cleaned up.

I was toying with the idea of putting a metadata only overlay on top
of this NFS filesystem (which can change underneath but only with new
and uniquely named directories and files), and then using a userspace
metadata copy-up to "localise" directories such that all lookups hit
the overlay, but file data is still served from the lower NFS server.
The file data in the upper layer and lower layer never actually
diverge and so the upper layer is more of a one time permanent
(metadata) "cache" of the lower NFS layer.

So something like "chown bob -R -h /blah/thing/UIIDA/versionXX/lib" to
copy-up metadata only. No subsequent changes will ever be made to
/blah/thing/UIIDA/versionXX/lib on the lower filesystem (other than it
being deleted). Now, at some point, a new directory
/blah/thing/UIIDB/versionYY/lib might appear on the lower NFS
filesystem that has not yet got any upper directory files other than
perhaps sharing part of the directory path - /blah/thing.

Now this *seems* to work in very basic testing and I have also read
the previous related discussion and patch here:

https://lore.kernel.org/all/CAOQ4uxiBmFdcueorKV7zwPLCDq4DE+H8x=8H1f7+3v3zysW9qA@xxxxxxxxxxxxxx

My first question is how bad can the "undefined behaviour" be in this
kind of setup? Any files that get copied up to the upper layer are
guaranteed to never change in the lower NFS filesystem (by it's
design), but new directories and files that have not yet been copied
up, can randomly appear over time. Deletions are not so important
because if it has been deleted in the lower level, then the upper
level copy failing has similar results (but we should cleanup the
upper layer too).

If it's possible to get over this first difficult hurdle, then I have
another extra bit of complexity to throw on top - now manually make an
entire directory tree (of metdata) that we have recursively copied up
"opaque" in the upper layer (currently needs to be done outside of
overlayfs). Over time or dropping of caches, I have found that this
(seamlessly?) takes effect for new lookups.

I also noticed that in the current implementation, this "opaque"
transition actual breaks access to the file because the metadata
copy-up sets "trusted.overlay.metacopy" but does not currently add an
explicit "trusted.overlay.redirect" to the correspnding lower layer
file. But if it did (or we do it manually with setfattr), then it is
possible to have an upper level directory that is opaque, contains
file metadata only and redirects to the data to the real files on the
lower NFS filesystem.

Why the hell would you want to do this? Well, for the case where you
are distributing software to many machines, having it on a shared NFS
filesystem is convenient and reasonably atomic. But when you have
sofware with many many PATHs (LD_LIBRARY, PYTHON, etc), you can create
some pretty impressive negative lookups across all those NFS hosted
directories that can overhelm a single NFS storage server at scale. By
"caching" or localising the entire PATH directory metadata locally on
each host, we can serve those negative lookups from local opaque
directories without traversing the network.

I think this is a common enough software distribution problem in large
systems and there are already many different solutions to work around
it. Most involve localising the software on demand from a central
repository.

Well, I just wondered if it could ever be done using an overlay in the
way I describe? But at the moment, it has to deal with a sporaidcally
changing lower filesystem and a manually hand crafted upper
filesystem. While I think this might all work fine if the filesystems
can be mounted and unmounted between software runs, it would be even
better if it could safely be done "online".

Things like fscache can also add NFS file content caching on top, but
it does not help with the metadata PATH walking problem endemic in
large clusters with software distributed on shared filesystems. I'm
suggesting a local metadata cache on top for read-only (but updated)
NFS software volumes.

Anyway, that's my silly idea for "lookup caching" (or acceleration) -
too crazy right? ;)

Daire




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux