Hello, I'm working for some time on fuse uring based communication that is numa aware and core-affine. In the current /dev/fuse based IO model requests are queued on lists that are not core-affine or numa aware. For every request a round trip between userspace and kernel is needed. When we benchmarked our atomic-open patches (also still WIP) initially confusing findings came up [1] and could be tracked down to multiple threads reading from /dev/fuse. After switching to a single thread that reads from /dev/fuse we got consistent and expected results. Later we also figured out that adding a polling spin fuse_dev_do_read() before going into a waitq sleep when no request is available greatly improved meta data benchmark performance [2]. That made us to think about the current communication and to look into a ring based queuing model. Around that time IORING_OP_URING_CMD was added to uring and the new userspace block device driver (ublk) is using that command, to send requests from kernel to userspace. I started to look how ublk works and started to adapt a similar model to fuse. State as today is that it is basically working, but I'm still fixing issues found by xfstests. Benchmarks and patch cleanup for submission follow next. https://github.com/bsbernd/linux/tree/fuse-uring https://github.com/bsbernd/libfuse/tree/uring (these branches will _not_ be used for upstream submission, these are purely for base development) A fuse design documentation update will also be added in the 1st RFC request, basic details follow as - Initial mount setup goes over /dev/fuse - fuse.ko queues FUSE_INIT in the existing /dev/fuse (background) queue - User space sets up the ring and all queues with a new ioctl - fuse.ko sets up the ring and allocates request queues/request memory per queue/request - Userspace mmaps these buffers and assigns them per queue/request - Data are send through these mmaped buffers, there is no kmap involved (difference to ublk) - Similar to ublk user space first submits SQEs with as FUSE_URING_REQ_FETCH, then later as FUSE_URING_REQ_COMMIT_AND_FETCH - commit results of the current request and fetch the next one. - FUSE_URING_REQ_FETCH also takes the FUSE_INIT request, later these lists are not checked anymore, as there is nothing supposed to be on them - The ring currently only only handles fuse pending and background requests (with credits assigned) - Forget requires libfuse still read /dev/fuse (handling will be added to the ring later) - In the WIP state request interrupts are not supported (yet) - Userspace needs to send fuse notifications to /dev/fuse, needs to be handled by the ring as well (or maybe a separate ring) - My goal was to keep compatibility with existing fuse file systems, except of the so far missing interrupt handling that should work so far. There are certainly some questionable design decisions and longer discussion threads might come up in the next weeks/months. Debating and resolving some of these in person might be very helpful. Ming is also working on zero-copy for ublk and I'm going to look into that next. Splice and zero-copy is currently not supported yet in my uring branch [3] Thanks, Bernd [1] https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@xxxxxxxxx/ [2] https://lore.kernel.org/lkml/6ba14287-336d-cdcd-0d39-680f288ca776@xxxxxxx/ [3] https://patchwork.kernel.org/project/linux-block/cover/20221103085004.1029763-1-ming.lei@xxxxxxxxxx/