Hi David, On 2024/1/25 22:02, David Howells wrote:
Here's a roadmap for the future development of netfslib and local caching (e.g. cachefiles).
Thanks for writing this detailed email. And congrats to you work. I only comment the parts directly related to myself.
...
Local Caching ============= There are a number of things I want to look at with local caching: [>] Although cachefiles has switched from using bmap to using SEEK_HOLE and SEEK_DATA, this isn't sufficient as we cannot rely on the backing filesystem optimising things and introducing both false positives and false negatives. Cachefiles needs to track the presence/absence of data for itself.
Yes, that is indeed an issue that needs to resolve and already discussed before.
I had a partially-implemented solution that stores a block bitmap in an xattr, but that only worked up to files of 1G in size (with bits representing 256K blocks in a 512-byte bitmap).
Jingbo once had an approach to use external bitmap files and extended-attribute pointers inside the cache files: https://listman.redhat.com/archives/linux-cachefs/2022-August/007050.html I'm not quite sure the performance was but if it's worth trying or comparing, that might be useful though.
[>] An alternative cache format might prove more fruitful. Various AFS implementations use a 'tagged cache' format with an index file and a bunch of small files each of which contains a single block (typically 256K in OpenAFS). This would offer some advantages over the current approach: - it can handle entry reuse within the index - doesn't require an external culling process - doesn't need to truncate/reallocate when invalidating There are some downsides, including: - each block is in a separate file
Not quite sure, yet accessing too many small files might be another issue which is currently happening with AI training workloads.. but as you said, it's worth trying.
- metadata coherency is more tricky - a powercut may require a cache wipe - the index key is highly variable in size if used for multiple filesystems But OpenAFS has been using this for something like 30 years, so it's probably worth a try.
Yes, also configurable chunk sizes per blob are much helpful. Thanks, Gao Xiang
[>] Need to work out some way to store xattrs, directory entries and inode metadata efficiently. [>] Using NVRAM as the cache rather than spinning rust. [>] Support for disconnected operation to pin desirable data and keep track of changes. [>] A user API by which the cache for specific files or volumes can be flushed. Disconnected Operation ====================== I'm working towards providing support for disconnected operation, so that, provided you've got your working set pinned in the cache, you can continue to work on your network-provided files when the network goes away and resync the changes later. This is going to require a number of things: (1) A user API by which files can be preloaded into the cache and pinned. (2) The ability to track changes in the cache. (3) A way to synchronise changes on reconnection. (4) A way to communicate to the user when there's a conflict with a third party change on reconnect. This might involve communicating via systemd to the desktop environment to ask the user to indicate how they'd like conflicts recolved. (5) A way to prompt the user to re-enter their authentication/crypto keys. (6) A way to ask the user how to handle a process that wants to access data we don't have (error/wait) - and how to handle the DE getting stuck in this fashion. David