This adds support for uring communication between kernel and userspace daemon using opcode the IORING_OP_URING_CMD. The basic appraoch was taken from ublk. The patches are in RFC state - I'm not sure about all decisions and some questions are marked with XXX. Userspace side has to send IOCTL(s) to configure ring queue(s) and it has the choice to configure exactly one ring or one ring per core. If there are use case we can also consider to allow a different number of rings - the ioctl configuration option is rather generic (number of queues). Right now a queue lock is taken for any ring entry state change, mostly to correctly handle unmount/daemon-stop. In fact, correctly stopping the ring took most of the development time - always new corner cases came up. I had run dozens of xfstest cycles, versions I had once seen a warning about the ring start_stop mutex being the wrong state - probably another stop issue, but I have not been able to track it down yet. Regarding the queue lock - I still need to do profiling, but my assumption is that it should not matter for the one-ring-per-core configuration. For the single ring config option lock contention might come up, but I see this configuration mostly for development only. Adding more complexity and protecting ring entries with their own locks can be done later. Current code also keep the fuse request allocation, initially I only had that for background requests when the ring queue didn't have free entries anymore. The allocation is done to reduce initial complexity, especially also for ring stop. The allocation free mode can be added back later. Right now always the ring queue of the submitting core is used, especially for page cached background requests we might consider later to also enqueue on other core queues (when these are not busy, of course). Splice/zero-copy is not supported yet, all requests go through the shared memory queue entry buffer. I also following splice and ublk/zc copy discussions, I will look into these options in the next days/weeks. To have that buffer allocated on the right numa node, a vmalloc is done per ring queue and on the numa node userspace daemon side asks for. My assumption is that the mmap offset parameter will be part of a debate and I'm curious what other think about that appraoch. Benchmarking and tuning is on my agenda for the next days. For now I only have xfstest results - most longer running tests were running at about 2x, but somehow when I cleaned up the patches for submission I lost that. My development VM/kernel has all sanitizers enabled - hard to profile what happened. Performance results with profiling will be submitted in a few days. The patches include a design document, which has a few more details. The corresponding libfuse patches are on my uring branch, but need cleanup for submission - will happen during the next days. https://github.com/bsbernd/libfuse/tree/uring If it should make review easier, patches posted here are on this branch https://github.com/bsbernd/linux/tree/fuse-uring-for-6.2 Bernd Schubert (13): fuse: Add uring data structures and documentation fuse: rename to fuse_dev_end_requests and make non-static fuse: Move fuse_get_dev to header file Add a vmalloc_node_user function fuse: Add a uring config ioctl and ring destruction fuse: Add an interval ring stop worker/monitor fuse: Add uring mmap method fuse: Move request bits fuse: Add wait stop ioctl support to the ring fuse: Handle SQEs - register commands fuse: Add support to copy from/to the ring buffer fuse: Add uring sqe commit and fetch support fuse: Allow to queue to the ring Documentation/filesystems/fuse-uring.rst | 179 +++ fs/fuse/Makefile | 2 +- fs/fuse/dev.c | 193 +++- fs/fuse/dev_uring.c | 1292 ++++++++++++++++++++++ fs/fuse/dev_uring_i.h | 23 + fs/fuse/fuse_dev_i.h | 62 ++ fs/fuse/fuse_i.h | 178 +++ fs/fuse/inode.c | 10 + include/linux/vmalloc.h | 1 + include/uapi/linux/fuse.h | 131 +++ mm/nommu.c | 6 + mm/vmalloc.c | 41 +- 12 files changed, 2064 insertions(+), 54 deletions(-) create mode 100644 Documentation/filesystems/fuse-uring.rst create mode 100644 fs/fuse/dev_uring.c create mode 100644 fs/fuse/dev_uring_i.h create mode 100644 fs/fuse/fuse_dev_i.h Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> cc: Miklos Szeredi <miklos@xxxxxxxxxx> cc: linux-fsdevel@xxxxxxxxxxxxxxx cc: Amir Goldstein <amir73il@xxxxxxxxx> cc: fuse-devel@xxxxxxxxxxxxxxxxxxxxx -- 2.37.2