On Thu, May 30, 2024 at 02:50:30PM +0200, Bernd Schubert wrote: > > > On 5/29/24 23:17, Josef Bacik wrote: > > On Wed, May 29, 2024 at 08:00:39PM +0200, Bernd Schubert wrote: > >> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> > >> --- > >> Documentation/filesystems/fuse-io-uring.rst | 167 ++++++++++++++++++++++++++++ > >> 1 file changed, 167 insertions(+) > >> > >> diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst > >> new file mode 100644 > >> index 000000000000..4aa168e3b229 > >> --- /dev/null > >> +++ b/Documentation/filesystems/fuse-io-uring.rst > >> @@ -0,0 +1,167 @@ > >> +.. SPDX-License-Identifier: GPL-2.0 > >> + > >> +=============================== > >> +FUSE Uring design documentation > >> +============================== > >> + > >> +This documentation covers basic details how the fuse > >> +kernel/userspace communication through uring is configured > >> +and works. For generic details about FUSE see fuse.rst. > >> + > >> +This document also covers the current interface, which is > >> +still in development and might change. > >> + > >> +Limitations > >> +=========== > >> +As of now not all requests types are supported through uring, userspace > > > > s/userspace side/userspace/ > > > >> +side is required to also handle requests through /dev/fuse after > >> +uring setup is complete. These are especially notifications (initiated > > > > especially is an awkward word choice here, I'm not quite sure what you're trying > > say here, perhaps > > > > "Specifically notifications (initiated from the daemon side), interrupts and > > forgets" > > Yep, thanks a lot! I removed forgets", these should be working over the ring > in the mean time. > > > > > ? > > > >> +from daemon side), interrupts and forgets. > >> +Interrupts are probably not working at all when uring is used. At least > >> +current state of libfuse will not be able to handle those for requests > >> +on ring queues. > >> +All these limitation will be addressed later. > >> + > >> +Fuse uring configuration > >> +======================== > >> + > >> +Fuse kernel requests are queued through the classical /dev/fuse > >> +read/write interface - until uring setup is complete. > >> + > >> +In order to set up fuse-over-io-uring userspace has to send ioctls, > >> +mmap requests in the right order > >> + > >> +1) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_RING_CFG > >> + > >> +First the basic kernel data structure has to be set up, using > >> +FUSE_DEV_IOC_URING with subcommand FUSE_URING_IOCTL_CMD_RING_CFG. > >> + > >> +Example (from libfuse) > >> + > >> +static int fuse_uring_setup_kernel_ring(int session_fd, > >> + int nr_queues, int sync_qdepth, > >> + int async_qdepth, int req_arg_len, > >> + int req_alloc_sz) > >> +{ > >> + int rc; > >> + > >> + struct fuse_ring_config rconf = { > >> + .nr_queues = nr_queues, > >> + .sync_queue_depth = sync_qdepth, > >> + .async_queue_depth = async_qdepth, > >> + .req_arg_len = req_arg_len, > >> + .user_req_buf_sz = req_alloc_sz, > >> + .numa_aware = nr_queues > 1, > >> + }; > >> + > >> + struct fuse_uring_cfg ioc_cfg = { > >> + .flags = 0, > >> + .cmd = FUSE_URING_IOCTL_CMD_RING_CFG, > >> + .rconf = rconf, > >> + }; > >> + > >> + rc = ioctl(session_fd, FUSE_DEV_IOC_URING, &ioc_cfg); > >> + if (rc) > >> + rc = -errno; > >> + > >> + return rc; > >> +} > >> + > >> +2) MMAP > >> + > >> +For shared memory communication between kernel and userspace > >> +each queue has to allocate and map memory buffer. > >> +For numa awares kernel side verifies if the allocating thread > > > > This bit is awkwardly worded and there's some spelling mistakes. Perhaps > > something like this? > > > > "For numa aware kernels, the kernel verifies that the allocating thread is bound > > to a single core, as the kernel has the expectation that only a single thread > > accesses a queue, and for numa aware memory allocation the core of the thread > > sending the mmap request is used to identify the numa node" > > Thank you, updated. I actually consider to reduce this to a warning (will try > to add an async FUSE_WARN request type for this and others). Issue is that > systems cannot set up fuse-uring when a core is disabled. > > > > >> +is bound to a single core - in general kernel side has expectations > >> +that only a single thread accesses a queue and for numa aware > >> +memory alloation the core of the thread sending the mmap request > >> +is used to identify the numa node. > >> + > >> +The offsset parameter has to be FUSE_URING_MMAP_OFF to identify > > ^^^^ "offset" > > > Fixed. > > > > >> +it is a request concerning fuse-over-io-uring. > >> + > >> +3) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_QUEUE_CFG > >> + > >> +This ioctl has to be send for every queue and takes the queue-id (qid) > > ^^^^ "sent" > > > >> +and memory address obtained by mmap to set up queue data structures. > >> + > >> +Kernel - userspace interface using uring > >> +======================================== > >> + > >> +After queue ioctl setup and memory mapping userspace submits > > > > This needs a comma, so > > > > "After queue ioctl setup and memory mapping, userspace submites" > > > >> +SQEs (opcode = IORING_OP_URING_CMD) in order to fetch > >> +fuse requests. Initial submit is with the sub command > >> +FUSE_URING_REQ_FETCH, which will just register entries > >> +to be available on the kernel side - it sets the according > > > > s/according/associated/ maybe? > > > >> +entry state and marks the entry as available in the queue bitmap. > > Or maybe like this? > > Initial submit is with the sub command FUSE_URING_REQ_FETCH, which > will just register entries to be available in the kernel. > > > >> + > >> +Once all entries for all queues are submitted kernel side starts > >> +to enqueue to ring queue(s). The request is copied into the shared > >> +memory queue entry buffer and submitted as CQE to the userspace > >> +side. > >> +Userspace side handles the CQE and submits the result as subcommand > >> +FUSE_URING_REQ_COMMIT_AND_FETCH - kernel side does completes the requests > > > > "the kernel completes the request" > > Yeah, now I see the bad grammar myself. Updated to > > > Once all entries for all queues are submitted, kernel starts > to enqueue to ring queues. The request is copied into the shared > memory buffer and submitted as CQE to the daemon. > Userspace handles the CQE/fuse-request and submits the result as > subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes > the requests and also marks the entry available again. If there are > pending requests waiting the request will be immediately submitted > to the daemon again. > > > > Thank you very much for your help to phrase this better! > This all looks great, thanks! Josef