On Wed, 2022-08-24 at 17:53 +0100, Matthew Wilcox wrote: > On Wed, Aug 24, 2022 at 04:27:04PM +0000, Trond Myklebust wrote: > > Right now, I see limited value in adding multipage folios to NFS. > > > > While basic NFSv4 does allow you to pretend there is a fundamental > > underlying block size, pNFS has changed all that, and we have had > > to > > engineer support for determining the I/O block size on the fly, and > > building the RPC requests accordingly. Client side mirroring just > > adds > > to the fun. > > > > As I see it, the only value that multipage folios might bring to > > NFS > > would be smaller page cache management overhead when dealing with > > large > > files. > > Yes, but that's a Really Big Deal. Machines with a lot of memory end > up with very long LRU lists. We can't afford the overhead of > managing > memory in 4kB chunks any more. (I don't want to dwell on this point > too > much; I've run the numbers before and can do so again if you want me > to > go into more details). > > Beyond that, filesystems have a lot of interactions with the page > cache > today. When I started looking at this, I thought filesystem people > all > had a deep understanding of how the page cache worked. Now I realise > everyone's as clueless as I am. The real benefit I see to projects > like > iomap/netfs is that they insulate filesystems from having to deal > with > the page cache. All the interactions are in two or three places and > we > can refactor without having to talk to the owners of 50+ filesystems. > > It also gives us a chance to re-examine some of the assumptions that > we have made over the years about how filesystems and page cache > should > be interacting. We've fixed a fair few bugs in recent years that > came > about because filesystem people don't tend to have deep knowledge of > mm > internals (and they shouldn't need to!) > > I don't know that netfs has the perfect interface to be used for nfs. > But that too can be changed to make it work better for your needs. If the VM folks need it, then adding support for multi-page folios is a much smaller scope than what David was describing. It can be done without too much surgery to the existing NFS I/O stack. We already have code to support I/O block sizes that are much less than the page size, so converting that to act on larger folios is not a huge deal. What would be useful there is something like a range tree to allow us to move beyond the PG_uptodate bit, and help make the is_partially_uptodate() address_space_operation a bit more useful. Otherwise, we end up having to read in the entire folio, which is what we do today for pages, but could get onerous with large folios when doing file random access. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-cachefs