[LSF/MM/BFP ATTEND][LSF/MM/BFP TOPIC] fuse uring communication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm working for some time on fuse uring based communication that is numa 
aware and core-affine.

In the current /dev/fuse based IO model requests are queued on lists 
that are not core-affine or numa aware. For every request a round trip 
between userspace and kernel is needed.
When we benchmarked our atomic-open patches (also still WIP) initially 
confusing findings came up [1] and could be tracked down to multiple 
threads reading from /dev/fuse. After switching to a single thread that 
reads from /dev/fuse we got consistent and expected results.
Later we also figured out that adding a polling spin fuse_dev_do_read() 
before going into a waitq sleep when no request is available greatly 
improved meta data benchmark performance [2].

That made us to think about the current communication and to look into a 
ring based queuing model. Around that time IORING_OP_URING_CMD was added 
to uring and the new userspace block device driver (ublk) is using that 
command, to send requests from kernel to userspace.
I started to look how ublk works and started to adapt a similar model to 
fuse. State as today is that it is basically working, but I'm still 
fixing issues found by xfstests. Benchmarks and patch cleanup for 
submission follow next.

https://github.com/bsbernd/linux/tree/fuse-uring
https://github.com/bsbernd/libfuse/tree/uring
(these branches will _not_ be used for upstream submission, these are 
purely for base development)


A fuse design documentation update will also be added in the 1st RFC 
request, basic details follow as

- Initial mount setup goes over /dev/fuse
- fuse.ko queues FUSE_INIT in the existing /dev/fuse (background) queue
- User space sets up the ring and all queues with a new ioctl
- fuse.ko sets up the ring and allocates request queues/request memory 
per queue/request
- Userspace mmaps these buffers and assigns them per queue/request
- Data are send through these mmaped buffers, there is no kmap involved 
(difference to ublk)
- Similar to ublk user space first submits SQEs with as 
FUSE_URING_REQ_FETCH, then later as FUSE_URING_REQ_COMMIT_AND_FETCH - 
commit results of the current request and fetch the next one.
- FUSE_URING_REQ_FETCH also takes the FUSE_INIT request, later these 
lists are not checked anymore, as there is nothing supposed to be on them
- The ring currently only only handles fuse pending and background 
requests (with credits assigned)
- Forget requires libfuse still read /dev/fuse (handling will be added 
to the ring later)
- In the WIP state request interrupts are not supported (yet)
- Userspace needs to send fuse notifications to /dev/fuse, needs to be 
handled by the ring as well (or maybe a separate ring)
- My goal was to keep compatibility with existing fuse file systems, 
except of the so far missing interrupt handling that should work so far.

There are certainly some questionable design decisions and longer 
discussion threads might come up in the next weeks/months. Debating and 
resolving some of these in person might be very helpful.

Ming is also working on zero-copy for ublk and I'm going to look into 
that next. Splice and zero-copy is currently not supported yet in my 
uring branch [3]


Thanks,
Bernd


[1] 
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@xxxxxxxxx/

[2] 
https://lore.kernel.org/lkml/6ba14287-336d-cdcd-0d39-680f288ca776@xxxxxxx/

[3] 
https://patchwork.kernel.org/project/linux-block/cover/20221103085004.1029763-1-ming.lei@xxxxxxxxxx/








[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux