On Wed, 2022-08-24 at 15:12 +0100, David Howells wrote: > Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > > As long as it is an opt-in feature, I'm OK. I don't want to have to > > compile it in by default. > > A cachefs should never become a mandatory feature of networked > > filesystems. > > netfslib is intended to be used even if fsache is not enabled. It is > intended > to make the underlying network filesystem maintainer's life easier > by: > > - Moving the implementation of all the VM ops from the network > filesystems as > much as possible into one place. The network filesystem then just > has to > provide a read op and a write op. > > - Making it such that the filesystem doesn't have to deal with the > difference > between DIO and buffered I/O > > - Handling VM features on behalf of all filesystems. This gives the > VM folk > one place to change instead of 5+. mpage and iomap are similar > things but > for blockdev filesystems. > > - Providing features to those filesystems that can support them, > eg.: > > - fscrypt > - compression > - bounce buffering > - local caching > - disconnected operation > > > Currently nfs interacts with fscache on a page-by-page basis, but > this needs > to change: > > (1) Multipage folios are now a thing. You need to roll folios out > into nfs > if you're going to take advantage of this. Also, you may have > noticed > that all the VM interfaces are being recast in terms of folios. Right now, I see limited value in adding multipage folios to NFS. While basic NFSv4 does allow you to pretend there is a fundamental underlying block size, pNFS has changed all that, and we have had to engineer support for determining the I/O block size on the fly, and building the RPC requests accordingly. Client side mirroring just adds to the fun. As I see it, the only value that multipage folios might bring to NFS would be smaller page cache management overhead when dealing with large files. > > (2) I need to fix the cache so that it no longer uses the backing > filesystem's metadata to track content. To make this use less > diskspace, > I want to increase the cache block size to, say, 256K or 2M. > > This means that the cache needs to have a say in how big a read > the > network filesystem does - and also that a single cache request > might need > to be stitched together from multiple read ops. > > (3) More pagecache changes are lurking in the future, possibly > including > getting rid of the concept of pages entirely from the pagecache. > > There are users of nfs + fscache and we'd like to continue to support > them as > best as possible but the current code noticably degrades performance > here. > > Unfortunately, I'm also going to need to drop the fallback interface > which nfs > currently uses in the next couple versions, we have to at least get > the > fscache enabled conversion done. > > I've been dealing with the VM, 9p, ceph and cifs people over the > direction > that netfslib might need to go in, but for nfs, it's typically been a > flat > "no". I would like to work out how to make netfslib work for nfs > also, if > you're willing to discuss it. > > I would be open to having a look at importing nfs page handling into > netfslib > and working from that - but it still needs to deal with (1) and (2) > above, and > I would like to make it pass iterators down to the lower layers as > buffer > descriptions. It's also very complicated stuff. > > Also: > > - I've noted the nfs_page structs that nfs uses and I'm looking at a > way of > having something similar, but held separately so that one struct > can span > and store information about multiple folios. > > - I'm looking at punting write-to-the-cache to writepages() or > something like > that so that the VM folks can reclaim the PG_private_2 flag bit, > so that > won't be available to nfs either in the future. > > - aops->write_begin() and ->write_end() are going to go away. In > netfslib > what I'm trying to do is make a "netfs_perform_write" as a > parallel to > generic_perform_write(). > What problems would any of this solve for NFS? I'm worried about the cost of all this proposed code churn as well; as you said 'it is complicated stuff', mainly for the good reason that we've been optimising a lot of code over the last 25-30 years. However let's start with the "why?" question first. Why do I need an extra layer of abstraction between NFS and the VM, when one of my primary concerns right now is that the stack depth keeps growing? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-cachefs