[PATCH RFC v2 04/19] fuse: Add fuse-io-uring design documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Signed-off-by: Bernd Schubert <bschubert@xxxxxxx>
---
 Documentation/filesystems/fuse-io-uring.rst | 167 ++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 000000000000..4aa168e3b229
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,167 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FUSE Uring design documentation
+==============================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through uring, userspace
+side is required to also handle requests through /dev/fuse after
+uring setup is complete. These are especially notifications (initiated
+from daemon side), interrupts and forgets.
+Interrupts are probably not working at all when uring is used. At least
+current state of libfuse will not be able to handle those for requests
+on ring queues.
+All these limitation will be addressed later.
+
+Fuse uring configuration
+========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until uring setup is complete.
+
+In order to set up fuse-over-io-uring userspace has to send ioctls,
+mmap requests in the right order
+
+1) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_RING_CFG
+
+First the basic kernel data structure has to be set up, using
+FUSE_DEV_IOC_URING with subcommand FUSE_URING_IOCTL_CMD_RING_CFG.
+
+Example (from libfuse)
+
+static int fuse_uring_setup_kernel_ring(int session_fd,
+					int nr_queues, int sync_qdepth,
+					int async_qdepth, int req_arg_len,
+					int req_alloc_sz)
+{
+	int rc;
+
+	struct fuse_ring_config rconf = {
+		.nr_queues		    = nr_queues,
+		.sync_queue_depth	= sync_qdepth,
+		.async_queue_depth	= async_qdepth,
+		.req_arg_len		= req_arg_len,
+		.user_req_buf_sz	= req_alloc_sz,
+		.numa_aware		    = nr_queues > 1,
+	};
+
+	struct fuse_uring_cfg ioc_cfg = {
+		.flags = 0,
+		.cmd = FUSE_URING_IOCTL_CMD_RING_CFG,
+		.rconf = rconf,
+	};
+
+	rc = ioctl(session_fd, FUSE_DEV_IOC_URING, &ioc_cfg);
+	if (rc)
+		rc = -errno;
+
+	return rc;
+}
+
+2) MMAP
+
+For shared memory communication between kernel and userspace
+each queue has to allocate and map memory buffer.
+For numa awares kernel side verifies if the allocating thread
+is bound to a single core - in general kernel side has expectations
+that only a single thread accesses a queue and for numa aware
+memory alloation the core of the thread sending the mmap request
+is used to identify the numa node.
+
+The offsset parameter has to be FUSE_URING_MMAP_OFF to identify
+it is a request concerning fuse-over-io-uring.
+
+3) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_QUEUE_CFG
+
+This ioctl has to be send for every queue and takes the queue-id (qid)
+and memory address obtained by mmap to set up queue data structures.
+
+Kernel - userspace interface using uring
+========================================
+
+After queue ioctl setup and memory mapping userspace submits
+SQEs (opcode = IORING_OP_URING_CMD) in order to fetch
+fuse requests. Initial submit is with the sub command
+FUSE_URING_REQ_FETCH, which will just register entries
+to be available on the kernel side - it sets the according
+entry state and marks the entry as available in the queue bitmap.
+
+Once all entries for all queues are submitted kernel side starts
+to enqueue to ring queue(s). The request is copied into the shared
+memory queue entry buffer and submitted as CQE to the userspace
+side.
+Userspace side handles the CQE and submits the result as subcommand
+FUSE_URING_REQ_COMMIT_AND_FETCH - kernel side does completes the requests
+and also marks the queue entry as available again. If there are
+pending requests waiting the request will be immediately submitted
+to userspace again.
+
+Initial SQE
+-----------
+
+ |                                    |  FUSE filesystem daemon
+ |                                    |
+ |                                    |  >io_uring_submit()
+ |                                    |   IORING_OP_URING_CMD /
+ |                                    |   FUSE_URING_REQ_FETCH
+ |                                    |  [wait cqe]
+ |                                    |   >io_uring_wait_cqe() or
+ |                                    |   >io_uring_submit_and_wait()
+ |                                    |
+ |  >fuse_uring_cmd()                 |
+ |   >fuse_uring_fetch()              |
+ |    >fuse_uring_ent_release()       |
+
+
+Sending requests with CQEs
+--------------------------
+
+ |                                         |  FUSE filesystem daemon
+ |                                         |  [waiting for CQEs]
+ |  "rm /mnt/fuse/file"                    |
+ |                                         |
+ |  >sys_unlink()                          |
+ |    >fuse_unlink()                       |
+ |      [allocate request]                 |
+ |      >__fuse_request_send()             |
+ |        ...                              |
+ |       >fuse_uring_queue_fuse_req        |
+ |        [queue request on fg or          |
+ |          bg queue]                      |
+ |         >fuse_uring_assign_ring_entry() |
+ |         >fuse_uring_send_to_ring()      |
+ |          >fuse_uring_copy_to_ring()     |
+ |          >io_uring_cmd_done()           |
+ |          >request_wait_answer()         |
+ |           [sleep on req->waitq]         |
+ |                                         |  [receives and handles CQE]
+ |                                         |  [submit result and fetch next]
+ |                                         |  >io_uring_submit()
+ |                                         |   IORING_OP_URING_CMD/
+ |                                         |   FUSE_URING_REQ_COMMIT_AND_FETCH
+ |  >fuse_uring_cmd()                      |
+ |   >fuse_uring_commit_and_release()      |
+ |    >fuse_uring_copy_from_ring()         |
+ |     [ copy the result to the fuse req]  |
+ |     >fuse_uring_req_end_and_get_next()  |
+ |      >fuse_request_end()                |
+ |       [wake up req->waitq]              |
+ |      >fuse_uring_ent_release_and_fetch()|
+ |       [wait or handle next req]         |
+ |                                         |
+ |                                         |
+ |       [req->waitq woken up]             |
+ |    <fuse_unlink()                       |
+ |  <sys_unlink()                          |
+
+
+

-- 
2.40.1





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux