[I removed RFC status as the design should be in place now and as xfstests pass. I still reviewing patches myself, though and also repeatings tests with different queue sizes.] This adds support for uring communication between kernel and userspace daemon using opcode the IORING_OP_URING_CMD. The basic approach was taken from ublk. Motivation for these patches is all to increase fuse performance, by: - Reducing kernel/userspace context switches - Part of that is given by the ring ring - handling multiple requests on either side of kernel/userspace without the need to switch per request - Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting the result of a request and fetching the next fuse request in one step. In contrary to legacy read/write to /dev/fuse - Core and numa affinity - one ring per core, which allows to avoid cpu core context switches A more detailed motivation description can be found in the introction of previous patch series https://lore.kernel.org/r/20241016-fuse-uring-for-6-10-rfc4-v4-0-9739c753666e@xxxxxxx That description also includes benchmark results with RFCv1. Performance with the current series needs to be tested, but will be lower, as several optimization patches are missing, like wake-up on the same core. These optimizations will be submitted after merging the main changes. The corresponding libfuse patches are on my uring branch, but needs cleanup for submission - that will be done once the kernel design will not change anymore https://github.com/bsbernd/libfuse/tree/uring Testing with that libfuse branch is possible by running something like: example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \ --uring --uring-q-depth=128 /scratch/source /scratch/dest With the --debug-fuse option one should see CQE in the request type, if requests are received via io-uring: cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060 unique: 4, result=104 Without the --uring option "cqe" is replaced by the default "dev" dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117 unique: 4, success, outsize: 120 Future work - different payload sizes per ring - zero copy Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> --- Changes in v7: - Bug fixes: - Removed unsetting ring->ready as that brought up a lock order violation for fc->bg_lock/queue->lock - Check for !fc->connected in fuse_uring_cmd(), tear down issues came up with large ring sizes without that. - Removal of (arg->size == 0) condition and warning in fuse_copy_args as that is actually expected for some op codes. - New init flag: FUSE_OVER_IO_URING to tell fuse-server about over-io-uring capability - Use fuse_set_zero_arg0() to set arg0 and rename to struct fuse_zero_header (I hope I got Miklos suggestion right) - Simplification of fuse_uring_ent_avail() - Renamed some structs in uapi/linux/fuse.h to fuse_uring (from fuse_ring) to be consistent - Removal of 'if 0' in fuse_uring_cmd() - Return -E... directly in fuse_uring_cmd() instead of setting err first and removal of goto's in that function. - Just a simple WARN_ON_ONCE() for (oh->unique & FUSE_INT_REQ_BIT) as that code should be unreachable - Removal of several pr_devel and some pr_warn() messages - Removed RFC as it passed several xfstests runs now - Link to v6: https://lore.kernel.org/r/20241122-fuse-uring-for-6-10-rfc4-v6-0-28e6cdd0e914@xxxxxxx Changes in v6: - Update to linux-6.12 - Use 'struct fuse_iqueue_ops' and redirect fiq->ops once the ring is ready. - Fix return code from fuse_uring_copy_from_ring on copy_from_user failure (Dan Carpenter / kernel test robot) - Avoid list iteration in fuse_uring_cancel (Joanne) - Simplified struct fuse_ring_req_header - Adds a new 'struct struct fuse_ring_ent_in_out' - Fix assigning ring->queues[qid] in fuse_uring_create_queue, it was too early, resulting in races - Add back 'FRRS_INVALID = 0' to ensure ring-ent states always have a value > 0 - Avoid assigning struct io_uring_cmd *cmd->pdu multiple times, once on settings up IO_URING_F_CANCEL is sufficient for sending the request as well. - Link to v5: https://lore.kernel.org/r/20241107-fuse-uring-for-6-10-rfc4-v5-0-e8660a991499@xxxxxxx Changes in v5: - Main focus in v5 is the separation of headers from payload, which required to introduce 'struct fuse_zero_in'. - Addressed several teardown issues, that were a regression in v4. - Fixed "BUG: sleeping function called" due to allocation while holding a lock reported by David Wei - Fix function comment reported by kernel test rebot - Fix set but unused variabled reported by test robot - Link to v4: https://lore.kernel.org/r/20241016-fuse-uring-for-6-10-rfc4-v4-0-9739c753666e@xxxxxxx Changes in v4: - Removal of ioctls, all configuration is done dynamically on the arrival of FUSE_URING_REQ_FETCH - ring entries are not (and cannot be without config ioctls) allocated as array of the ring/queue - removal of the tag variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH is more cumbersome now and needs an almost unused struct fuse_pqueue per fuse_ring_queue and uses the unique id of fuse requests. - No device clones needed for to workaroung hanging mounts on fuse-server/daemon termination, handled by IO_URING_F_CANCEL - Removal of sync/async ring entry types - Addressed some of Joannes comments, but probably not all - Only very basic tests run for v3, as more updates should follow quickly. Changes in v3 - Removed the __wake_on_current_cpu optimization (for now as that needs to go through another subsystem/tree) , removing it means a significant performance drop) - Removed MMAP (Miklos) - Switched to two IOCTLs, instead of one ioctl that had a field for subcommands (ring and queue config) (Miklos) - The ring entry state is a single state and not a bitmask anymore (Josef) - Addressed several other comments from Josef (I need to go over the RFCv2 review again, I'm not sure if everything is addressed already) - Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@xxxxxxx - Link to v2: https://lore.kernel.org/all/20240529-fuse-uring-for-6-9-rfc2-out-v1-0-d149476b1d65@xxxxxxx/ - Link to v1: https://lore.kernel.org/r/20240529-fuse-uring-for-6-9-rfc2-out-v1-0-d149476b1d65@xxxxxxx --- Bernd Schubert (15): fuse: rename to fuse_dev_end_requests and make non-static fuse: Move fuse_get_dev to header file fuse: Move request bits fuse: Add fuse-io-uring design documentation fuse: make args->in_args[0] to be always the header fuse: {uring} Handle SQEs - register commands fuse: Make fuse_copy non static fuse: Add fuse-io-uring handling into fuse_copy fuse: {uring} Add uring sqe commit and fetch support fuse: {uring} Handle teardown of ring entries fuse: {uring} Allow to queue fg requests through io-uring fuse: {uring} Allow to queue bg requests through io-uring fuse: {uring} Handle IO_URING_F_TASK_DEAD fuse: {io-uring} Prevent mount point hang on fuse-server termination fuse: enable fuse-over-io-uring Pavel Begunkov (1): io_uring/cmd: let cmds to know about dying task Documentation/filesystems/fuse-io-uring.rst | 101 +++ fs/fuse/Kconfig | 12 + fs/fuse/Makefile | 1 + fs/fuse/dax.c | 11 +- fs/fuse/dev.c | 124 +-- fs/fuse/dev_uring.c | 1269 +++++++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 204 +++++ fs/fuse/dir.c | 32 +- fs/fuse/fuse_dev_i.h | 68 ++ fs/fuse/fuse_i.h | 27 + fs/fuse/inode.c | 12 +- fs/fuse/xattr.c | 7 +- include/linux/io_uring_types.h | 1 + include/uapi/linux/fuse.h | 67 ++ io_uring/uring_cmd.c | 6 +- 15 files changed, 1866 insertions(+), 76 deletions(-) --- base-commit: 3022e9d00ebec31ed435ae0844e3f235dba998a9 change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8 Best regards, -- Bernd Schubert <bschubert@xxxxxxx>