On Thu, 2024-03-28 at 16:33 +0000, David Howells wrote: > Hi Christian, Willy, > > The primary purpose of these patches is to rework the netfslib writeback > implementation such that pages read from the cache are written to the cache > through ->writepages(), thereby allowing the fscache page flag to be > retired. > > The reworking also: > > (1) builds on top of the new writeback_iter() infrastructure; > > (2) makes it possible to use vectored write RPCs as discontiguous streams > of pages can be accommodated; > > (3) makes it easier to do simultaneous content crypto and stream division. > > (4) provides support for retrying writes and re-dividing a stream; > > (5) replaces the ->launder_folio() op, so that ->writepages() is used > instead; > > (6) uses mempools to allocate the netfs_io_request and netfs_io_subrequest > structs to avoid allocation failure in the writeback path. > > Some code that uses the fscache page flag is retained for compatibility > purposes with nfs and ceph. The code is switched to using the synonymous > private_2 label instead and marked with deprecation comments. I have a > separate set of patches that convert cifs to use this code. > > -~- > > In this new implementation, writeback_iter() is used to pump folios, > progressively creating two parallel, but separate streams. Either or both > streams can contain gaps, and the subrequests in each stream can be of > variable size, don't need to align with each other and don't need to align > with the folios. (Note that more streams can be added if we have multiple > servers to duplicate data to). > > Indeed, subrequests can cross folio boundaries, may cover several folios or > a folio may be spanned by multiple subrequests, e.g.: > > +---+---+-----+-----+---+----------+ > Folios: | | | | | | | > +---+---+-----+-----+---+----------+ > > +------+------+ +----+----+ > Upload: | | |.....| | | > +------+------+ +----+----+ > > +------+------+------+------+------+ > Cache: | | | | | | > +------+------+------+------+------+ > > Data that got read from the server that needs copying to the cache is > stored in folios that are marked dirty and have folio->private set to a > special value. > > The progressive subrequest construction permits the algorithm to be > preparing both the next upload to the server and the next write to the > cache whilst the previous ones are already in progress. Throttling can be > applied to control the rate of production of subrequests - and, in any > case, we probably want to write them to the server in ascending order, > particularly if the file will be extended. > > Content crypto can also be prepared at the same time as the subrequests and > run asynchronously, with the prepped requests being stalled until the > crypto catches up with them. This might also be useful for transport > crypto, but that happens at a lower layer, so probably would be harder to > pull off. > > The algorithm is split into three parts: > > (1) The issuer. This walks through the data, packaging it up, encrypting > it and creating subrequests. The part of this that generates > subrequests only deals with file positions and spans and so is usable > for DIO/unbuffered writes as well as buffered writes. > > (2) The collector. This asynchronously collects completed subrequests, > unlocks folios, frees crypto buffers and performs any retries. This > runs in a work queue so that the issuer can return to the caller for > writeback (so that the VM can have its kswapd thread back) or async > writes. > > Collection is slightly complex as the collector has to work out where > discontiguities happen in the folio list so that it doesn't try and > collect folios that weren't included in the write out. > > (3) The retryer. This pauses the issuer, waits for all outstanding > subrequests to complete and then goes through the failed subrequests > to reissue them. This may involve reprepping them (with cifs, the > credits must be renegotiated and a subrequest may need splitting), and > doing RMW for content crypto if there's a conflicting change on the > server. > > David > > David Howells (26): > cifs: Fix duplicate fscache cookie warnings > 9p: Clean up some kdoc and unused var warnings. > netfs: Update i_blocks when write committed to pagecache > netfs: Replace PG_fscache by setting folio->private and marking dirty > mm: Remove the PG_fscache alias for PG_private_2 > netfs: Remove deprecated use of PG_private_2 as a second writeback > flag > netfs: Make netfs_io_request::subreq_counter an atomic_t > netfs: Use subreq_counter to allocate subreq debug_index values > mm: Provide a means of invalidation without using launder_folio > cifs: Use alternative invalidation to using launder_folio > 9p: Use alternative invalidation to using launder_folio > afs: Use alternative invalidation to using launder_folio > netfs: Remove ->launder_folio() support > netfs: Use mempools for allocating requests and subrequests > mm: Export writeback_iter() > netfs: Switch to using unsigned long long rather than loff_t > netfs: Fix writethrough-mode error handling > netfs: Add some write-side stats and clean up some stat names > netfs: New writeback implementation > netfs, afs: Implement helpers for new write code > netfs, 9p: Implement helpers for new write code > netfs, cachefiles: Implement helpers for new write code > netfs: Cut over to using new writeback code > netfs: Remove the old writeback code > netfs: Miscellaneous tidy ups > netfs, afs: Use writeback retry to deal with alternate keys > > fs/9p/vfs_addr.c | 60 +-- > fs/9p/vfs_inode_dotl.c | 4 - > fs/afs/file.c | 8 +- > fs/afs/internal.h | 6 +- > fs/afs/validation.c | 4 +- > fs/afs/write.c | 187 ++++---- > fs/cachefiles/io.c | 75 +++- > fs/ceph/addr.c | 24 +- > fs/ceph/inode.c | 2 + > fs/netfs/Makefile | 3 +- > fs/netfs/buffered_read.c | 40 +- > fs/netfs/buffered_write.c | 832 ++++------------------------------- > fs/netfs/direct_write.c | 30 +- > fs/netfs/fscache_io.c | 14 +- > fs/netfs/internal.h | 55 ++- > fs/netfs/io.c | 155 +------ > fs/netfs/main.c | 55 ++- > fs/netfs/misc.c | 10 +- > fs/netfs/objects.c | 81 +++- > fs/netfs/output.c | 478 -------------------- > fs/netfs/stats.c | 17 +- > fs/netfs/write_collect.c | 813 ++++++++++++++++++++++++++++++++++ > fs/netfs/write_issue.c | 673 ++++++++++++++++++++++++++++ > fs/nfs/file.c | 8 +- > fs/nfs/fscache.h | 6 +- > fs/nfs/write.c | 4 +- > fs/smb/client/cifsfs.h | 1 - > fs/smb/client/file.c | 136 +----- > fs/smb/client/fscache.c | 16 +- > fs/smb/client/inode.c | 27 +- > include/linux/fscache.h | 22 +- > include/linux/netfs.h | 196 +++++---- > include/linux/pagemap.h | 1 + > include/net/9p/client.h | 2 + > include/trace/events/netfs.h | 249 ++++++++++- > mm/filemap.c | 52 ++- > mm/page-writeback.c | 1 + > net/9p/Kconfig | 1 + > net/9p/client.c | 49 +++ > net/9p/trans_fd.c | 1 - > 40 files changed, 2492 insertions(+), 1906 deletions(-) > delete mode 100644 fs/netfs/output.c > create mode 100644 fs/netfs/write_collect.c > create mode 100644 fs/netfs/write_issue.c > This all looks pretty reasonable. There is at least one bugfix that looks like it ought to go in independently (#17). #19 is huge, complex and hard to review. That will need some cycles in -next, I think. In any case, on any that I didn't send comments you can add: Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>