Hey Linus, /* Summary */ This contains the work to improve read/write performance for the new netfs library. The main performance enhancing changes are: - Define a structure, struct folio_queue, and a new iterator type, ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See that patch for questions about naming and form. ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The problem with an xarray is that accessing it requires the use of a lock (typically the RCU read lock) - and this means that we can't supply iterate_and_advance() with a step function that might sleep (crypto for example) without having to drop the lock between pages. ITER_FOLIOQ is the iterator for a chain of folio_queue structs, where each folio_queue holds a small list of folios. A folio_queue struct is a simpler structure than xarray and is not subject to concurrent manipulation by the VM. folio_queue is used rather than a bvec[] as it can form lists of indefinite size, adding to one end and removing from the other on the fly. - Provide a copy_folio_from_iter() wrapper. - Make cifs RDMA support ITER_FOLIOQ. - Use folio queues in the write-side helpers instead of xarrays. - Add a function to reset the iterator in a subrequest. - Simplify the write-side helpers to use sheaves to skip gaps rather than trying to work out where gaps are. - In afs, make the read subrequests asynchronous, putting them into work items to allow the next patch to do progressive unlocking/reading. - Overhaul the read-side helpers to improve performance. - Fix the caching of a partial block at the end of a file. - Allow a store to be cancelled. Then some changes for cifs to make it use folio queues instead of xarrays for crypto bufferage: - Use raw iteration functions rather than manually coding iteration when hashing data. - Switch to using folio_queue for crypto buffers. - Remove the xarray bits. Make some adjustments to the /proc/fs/netfs/stats file such that: - All the netfs stats lines begin 'Netfs:' but change this to something a bit more useful. - Add a couple of stats counters to track the numbers of skips and waits on the per-inode writeback serialisation lock to make it easier to check for this as a source of performance loss. Miscellaneous work: - Ensure that the sb_writers lock is taken around vfs_{set,remove}xattr() in the cachefiles code. - Reduce the number of conditional branches in netfs_perform_write(). - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and remove cifs_post_modify(). - Move the max_len/max_nr_segs members from netfs_io_subrequest to netfs_io_request as they're only needed for one subreq at a time. - Add an 'unknown' source value for tracing purposes. - Remove NETFS_COPY_TO_CACHE as it's no longer used. - Set the request work function up front at allocation time. - Use bh-disabling spinlocks for rreq->lock as cachefiles completion may be run from block-filesystem DIO completion in softirq context. - Remove fs/netfs/io.c. /* Testing */ gcc version 14.2.0 (Debian 14.2.0-3) Debian clang version 16.0.6 (27+b1) All patches are based on the vfs-6.11-rc7.fixes merge to bring in prerequisite fixes in individual filesystems. All of this has been sitting in linux-next. No build failures or warnings were observed. /* Conflicts */ Merge conflicts with mainline ============================= No known merge conflicts. This has now a merge conflict with main due to some rather late cifs fixes. This can be resolved by: git rm fs/netfs/io.c and then: diff --cc fs/smb/client/cifssmb.c index cfae2e918209,04f2a5441a89..d0df0c17b18f --- a/fs/smb/client/cifssmb.c +++ b/fs/smb/client/cifssmb.c @@@ -1261,16 -1261,6 +1261,14 @@@ openRetry return rc; } +static void cifs_readv_worker(struct work_struct *work) +{ + struct cifs_io_subrequest *rdata = + container_of(work, struct cifs_io_subrequest, subreq.work); + - netfs_subreq_terminated(&rdata->subreq, - (rdata->result == 0 || rdata->result == -EAGAIN) ? - rdata->got_bytes : rdata->result, true); ++ netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); +} + static void cifs_readv_callback(struct mid_q_entry *mid) { @@@ -1323,21 -1306,11 +1321,23 @@@ rdata->result = -EIO; } - if (rdata->result == 0 || rdata->result == -EAGAIN) - iov_iter_advance(&rdata->subreq.io_iter, rdata->got_bytes); + if (rdata->result == -ENODATA) { + __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); + rdata->result = 0; + } else { - if (rdata->got_bytes < rdata->actual_len && - rdata->subreq.start + rdata->subreq.transferred + rdata->got_bytes == - ictx->remote_i_size) { ++ size_t trans = rdata->subreq.transferred + rdata->got_bytes; ++ if (trans < rdata->subreq.len && ++ rdata->subreq.start + trans == ictx->remote_i_size) { + __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); + rdata->result = 0; + } + } + rdata->credits.value = 0; + rdata->subreq.transferred += rdata->got_bytes; - netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false); ++ trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); + INIT_WORK(&rdata->subreq.work, cifs_readv_worker); + queue_work(cifsiod_wq, &rdata->subreq.work); release_mid(mid); add_credits(server, &credits, 0); } diff --cc fs/smb/client/smb2pdu.c index 88dc49d67037,95377bb91950..bb8ecbbe78af --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry server->credits, server->in_flight, 0, cifs_trace_rw_credits_read_response_clear); rdata->credits.value = 0; + rdata->subreq.transferred += rdata->got_bytes; - if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size) - __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags); + trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress); INIT_WORK(&rdata->subreq.work, smb2_readv_worker); queue_work(cifsiod_wq, &rdata->subreq.work); release_mid(mid); Merge conflicts with other trees ================================ No known merge conflicts. The following changes since commit 4356ab331c8f0dbed0f683abde345cd5503db1e4: Merge tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs (2024-09-04 09:33:57 -0700) are available in the Git repository at: git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs for you to fetch changes up to 4b40d43d9f951d87ae8dc414c2ef5ae50303a266: docs: filesystems: corrected grammar of netfs page (2024-09-12 12:20:43 +0200) Please consider pulling these changes from the signed vfs-6.12.netfs tag. Thanks! Christian ---------------------------------------------------------------- vfs-6.12.netfs ---------------------------------------------------------------- Christian Brauner (1): Merge branch 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs David Howells (24): cachefiles: Fix non-taking of sb_writers around set/removexattr netfs: Adjust labels in /proc/fs/netfs/stats netfs: Record contention stats for writeback lock netfs: Reduce number of conditional branches in netfs_perform_write() netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Remove NETFS_COPY_TO_CACHE netfs: Set the request work function upon allocation netfs: Use bh-disabling spinlocks for rreq->lock mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios iov_iter: Provide copy_folio_from_iter() cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs netfs: Use new folio_queue data type and iterator instead of xarray iter netfs: Provide an iterator-reset function netfs: Simplify the writeback code afs: Make read subreqs async netfs: Speed up buffered reading netfs: Remove fs/netfs/io.c cachefiles, netfs: Fix write to partial block at EOF netfs: Cancel dirty folios that have no storage destination cifs: Use iterate_and_advance*() routines directly for hashing cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Don't support ITER_XARRAY Dennis Lam (1): docs: filesystems: corrected grammar of netfs page Documentation/filesystems/netfs_library.rst | 2 +- fs/9p/vfs_addr.c | 11 +- fs/afs/file.c | 30 +- fs/afs/fsclient.c | 9 +- fs/afs/write.c | 4 +- fs/afs/yfsclient.c | 9 +- fs/cachefiles/io.c | 19 +- fs/cachefiles/xattr.c | 34 +- fs/ceph/addr.c | 76 +-- fs/netfs/Makefile | 4 +- fs/netfs/buffered_read.c | 766 ++++++++++++++++---------- fs/netfs/buffered_write.c | 309 +++++------ fs/netfs/direct_read.c | 147 ++++- fs/netfs/internal.h | 43 +- fs/netfs/io.c | 804 ---------------------------- fs/netfs/iterator.c | 50 ++ fs/netfs/main.c | 7 +- fs/netfs/misc.c | 94 ++++ fs/netfs/objects.c | 16 +- fs/netfs/read_collect.c | 544 +++++++++++++++++++ fs/netfs/read_pgpriv2.c | 264 +++++++++ fs/netfs/read_retry.c | 256 +++++++++ fs/netfs/stats.c | 27 +- fs/netfs/write_collect.c | 246 +++------ fs/netfs/write_issue.c | 93 ++-- fs/nfs/fscache.c | 19 +- fs/nfs/fscache.h | 7 +- fs/smb/client/cifsencrypt.c | 144 +---- fs/smb/client/cifsglob.h | 4 +- fs/smb/client/cifssmb.c | 6 +- fs/smb/client/file.c | 96 ++-- fs/smb/client/smb2ops.c | 219 ++++---- fs/smb/client/smb2pdu.c | 27 +- fs/smb/client/smbdirect.c | 82 +-- include/linux/folio_queue.h | 156 ++++++ include/linux/iov_iter.h | 104 ++++ include/linux/netfs.h | 46 +- include/linux/uio.h | 18 + include/trace/events/netfs.h | 144 +++-- lib/iov_iter.c | 240 ++++++++- lib/kunit_iov_iter.c | 259 +++++++++ lib/scatterlist.c | 69 ++- 42 files changed, 3520 insertions(+), 1984 deletions(-) delete mode 100644 fs/netfs/io.c create mode 100644 fs/netfs/read_collect.c create mode 100644 fs/netfs/read_pgpriv2.c create mode 100644 fs/netfs/read_retry.c create mode 100644 include/linux/folio_queue.h