Re: [PATCH RFC v2 04/19] fuse: Add fuse-io-uring design documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 30, 2024 at 02:50:30PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/29/24 23:17, Josef Bacik wrote:
> > On Wed, May 29, 2024 at 08:00:39PM +0200, Bernd Schubert wrote:
> >> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx>
> >> ---
> >>  Documentation/filesystems/fuse-io-uring.rst | 167 ++++++++++++++++++++++++++++
> >>  1 file changed, 167 insertions(+)
> >>
> >> diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
> >> new file mode 100644
> >> index 000000000000..4aa168e3b229
> >> --- /dev/null
> >> +++ b/Documentation/filesystems/fuse-io-uring.rst
> >> @@ -0,0 +1,167 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +
> >> +===============================
> >> +FUSE Uring design documentation
> >> +==============================
> >> +
> >> +This documentation covers basic details how the fuse
> >> +kernel/userspace communication through uring is configured
> >> +and works. For generic details about FUSE see fuse.rst.
> >> +
> >> +This document also covers the current interface, which is
> >> +still in development and might change.
> >> +
> >> +Limitations
> >> +===========
> >> +As of now not all requests types are supported through uring, userspace
> > 
> > s/userspace side/userspace/
> > 
> >> +side is required to also handle requests through /dev/fuse after
> >> +uring setup is complete. These are especially notifications (initiated
> > 
> > especially is an awkward word choice here, I'm not quite sure what you're trying
> > say here, perhaps
> > 
> > "Specifically notifications (initiated from the daemon side), interrupts and
> > forgets"
> 
> Yep, thanks a lot! I removed forgets", these should be working over the ring 
> in the mean time.
> 
> > 
> > ?
> > 
> >> +from daemon side), interrupts and forgets.
> >> +Interrupts are probably not working at all when uring is used. At least
> >> +current state of libfuse will not be able to handle those for requests
> >> +on ring queues.
> >> +All these limitation will be addressed later.
> >> +
> >> +Fuse uring configuration
> >> +========================
> >> +
> >> +Fuse kernel requests are queued through the classical /dev/fuse
> >> +read/write interface - until uring setup is complete.
> >> +
> >> +In order to set up fuse-over-io-uring userspace has to send ioctls,
> >> +mmap requests in the right order
> >> +
> >> +1) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_RING_CFG
> >> +
> >> +First the basic kernel data structure has to be set up, using
> >> +FUSE_DEV_IOC_URING with subcommand FUSE_URING_IOCTL_CMD_RING_CFG.
> >> +
> >> +Example (from libfuse)
> >> +
> >> +static int fuse_uring_setup_kernel_ring(int session_fd,
> >> +					int nr_queues, int sync_qdepth,
> >> +					int async_qdepth, int req_arg_len,
> >> +					int req_alloc_sz)
> >> +{
> >> +	int rc;
> >> +
> >> +	struct fuse_ring_config rconf = {
> >> +		.nr_queues		    = nr_queues,
> >> +		.sync_queue_depth	= sync_qdepth,
> >> +		.async_queue_depth	= async_qdepth,
> >> +		.req_arg_len		= req_arg_len,
> >> +		.user_req_buf_sz	= req_alloc_sz,
> >> +		.numa_aware		    = nr_queues > 1,
> >> +	};
> >> +
> >> +	struct fuse_uring_cfg ioc_cfg = {
> >> +		.flags = 0,
> >> +		.cmd = FUSE_URING_IOCTL_CMD_RING_CFG,
> >> +		.rconf = rconf,
> >> +	};
> >> +
> >> +	rc = ioctl(session_fd, FUSE_DEV_IOC_URING, &ioc_cfg);
> >> +	if (rc)
> >> +		rc = -errno;
> >> +
> >> +	return rc;
> >> +}
> >> +
> >> +2) MMAP
> >> +
> >> +For shared memory communication between kernel and userspace
> >> +each queue has to allocate and map memory buffer.
> >> +For numa awares kernel side verifies if the allocating thread
> > 
> > This bit is awkwardly worded and there's some spelling mistakes.  Perhaps
> > something like this?
> > 
> > "For numa aware kernels, the kernel verifies that the allocating thread is bound
> > to a single core, as the kernel has the expectation that only a single thread
> > accesses a queue, and for numa aware memory allocation the core of the thread
> > sending the mmap request is used to identify the numa node"
> 
> Thank you, updated. I actually consider to reduce this to a warning (will try 
> to add an async FUSE_WARN request type for this and others). Issue is that
> systems cannot set up fuse-uring when a core is disabled. 
> 
> > 
> >> +is bound to a single core - in general kernel side has expectations
> >> +that only a single thread accesses a queue and for numa aware
> >> +memory alloation the core of the thread sending the mmap request
> >> +is used to identify the numa node.
> >> +
> >> +The offsset parameter has to be FUSE_URING_MMAP_OFF to identify
> >        ^^^^ "offset"
> 
> 
> Fixed.
> 
> > 
> >> +it is a request concerning fuse-over-io-uring.
> >> +
> >> +3) FUSE_DEV_IOC_URING ioctl with FUSE_URING_IOCTL_CMD_QUEUE_CFG
> >> +
> >> +This ioctl has to be send for every queue and takes the queue-id (qid)
> >                         ^^^^ "sent"
> > 
> >> +and memory address obtained by mmap to set up queue data structures.
> >> +
> >> +Kernel - userspace interface using uring
> >> +========================================
> >> +
> >> +After queue ioctl setup and memory mapping userspace submits
> > 
> > This needs a comma, so
> > 
> > "After queue ioctl setup and memory mapping, userspace submites"
> > 
> >> +SQEs (opcode = IORING_OP_URING_CMD) in order to fetch
> >> +fuse requests. Initial submit is with the sub command
> >> +FUSE_URING_REQ_FETCH, which will just register entries
> >> +to be available on the kernel side - it sets the according
> > 
> > s/according/associated/ maybe?
> > 
> >> +entry state and marks the entry as available in the queue bitmap.
> 
> Or maybe like this?
> 
> Initial submit is with the sub command FUSE_URING_REQ_FETCH, which 
> will just register entries to be available in the kernel.
> 
> 
> >> +
> >> +Once all entries for all queues are submitted kernel side starts
> >> +to enqueue to ring queue(s). The request is copied into the shared
> >> +memory queue entry buffer and submitted as CQE to the userspace
> >> +side.
> >> +Userspace side handles the CQE and submits the result as subcommand
> >> +FUSE_URING_REQ_COMMIT_AND_FETCH - kernel side does completes the requests
> > 
> > "the kernel completes the request"
> 
> Yeah, now I see the bad grammar myself. Updated to
> 
> 
> Once all entries for all queues are submitted, kernel starts
> to enqueue to ring queues. The request is copied into the shared
> memory buffer and submitted as CQE to the daemon.
> Userspace handles the CQE/fuse-request and submits the result as
> subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
> the requests and also marks the entry available again. If there are
> pending requests waiting the request will be immediately submitted
> to the daemon again.
> 
> 
> 
> Thank you very much for your help to phrase this better!
> 

This all looks great, thanks!

Josef




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux