Hi Linus, Trond/Anna, Steve, Eric, I have an fscache rewrite that I'm tempted to put in for the next merge window: https://lore.kernel.org/linux-fsdevel/159465784033.1376674.18106463693989811037.stgit@xxxxxxxxxxxxxxxxxxxxxx/ It improves the code by: (*) Ripping out the stuff that uses page cache snooping and kernel_write() and using kiocb instead. This gives multiple wins: uses async DIO rather than snooping for updated pages and then copying them, less VM overhead. (*) Object management is also simplified, getting rid of the state machine that was managing things and using a much simplified thread pool instead. (*) Object invalidation creates a tmpfile and diverts new activity to that so that it doesn't have to synchronise in-flight ADIO. (*) Using a bitmap stored in an xattr rather than using bmap to find out if a block is present in the cache. Probing the backing filesystem's metadata to find out is not reliable in modern extent-based filesystems as them may insert or remove blocks of zeros. Even SEEK_HOLE/SEEK_DATA are problematic since they don't distinguish transparently inserted bridging. I've provided a read helper that handles ->readpage, ->readpages, and preparatory writes in ->write_begin. Willy is looking at using this as a way to roll his new ->readahead op out into filesystems. A good chunk of this will move into MM code. The code is simpler, and this is nice too: 67 files changed, 5947 insertions(+), 8294 deletions(-) not including documentation changes, which I need to convert to rst format yet. That removes a whole bunch more lines. But there are reasons you might not want to take it yet: (1) It starts off by disabling fscache support in all the filesystems that use it: afs, nfs, cifs, ceph and 9p. I've taken care of afs, Dave Wysochanski has patches for nfs: https://lore.kernel.org/linux-nfs/1596031949-26793-1-git-send-email-dwysocha@xxxxxxxxxx/ but they haven't been reviewed by Trond or Anna yet, and Jeff Layton has patches for ceph: https://marc.info/?l=ceph-devel&m=159541538914631&w=2 and I've briefly discussed cifs with Steve, but nothing has started there yet. 9p I've not looked at yet. Now, if we're okay for going a kernel release with 4/5 filesystems with caching disabled and then pushing the changes for individual filesystems through their respective trees, it might be easier. Unfortunately, I wasn't able to get together with Trond and Anna at LSF to discuss this. (2) The patched afs fs passed xfstests -g quick (unlike the upstream code that oopses pretty quickly with caching enabled). Dave and Jeff's nfs and ceph code is getting close, but not quite there yet. (3) Al has objections to the ITER_MAPPING iov_iter type that I added https://lore.kernel.org/linux-fsdevel/20200719014436.GG2786714@xxxxxxxxxxxxxxxxxx/ but note that iov_iter_for_each_range() is not actually used by anything. However, Willy likes it and would prefer to make it ITER_XARRAY instead as he might be able to use it in other places, though there's an issue where I'm calling find_get_pages_contig() which takes a mapping (though all it does is then get the xarray out of it). Instead I would have to use ITER_BVEC, which has quite a high overhead, though it would mean that the RCU read lock wouldn't be necessary. This would require 1K of memory for every 256K block the cache wants to read; for any read >1M, I'd have to use vmalloc() instead. I'd also prefer not to use ITER_BVEC because the offset and length are superfluous here. If ITER_MAPPING is not good, would it be possible to have an ITER_PAGEARRAY that just takes a page array instead? Or, even, create a transient xarray? (4) The way object culling is managed needs overhauling too, but that's a whole 'nother patchset. We could wait till that's done too, but its lack doesn't prevent what we have now being used. Thoughts? David