Re: [LSF/MM/BFP ATTEND][LSF/MM/BFP TOPIC] fuse uring communication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 5 Feb 2023 at 02:00, Bernd Schubert <bschubert@xxxxxxx> wrote:
>
> Hello,
>
> I'm working for some time on fuse uring based communication that is numa
> aware and core-affine.

I might have mentioned this earlier, but one of the bigger issues with
NUMA that I found was that having a single process with multiple
threads serving queues of different NUMA nodes incurs a performance
hit each time a server thread gets to run. This is due to having to
update mm->cpu_bitmap, which indicates on which  CPUs the current
process is running on.  This bitmap is shared by the address space,
hence constantly updating it from different nodes means having to move
it from one node to the other.

My workaround was to use separate processes (address space is not
shared) but use shared memory for common structures.  This complicates
things quite a bit, so it would be nice to find some other way of
fixing this issue.  For example it occurs to me that making this
bitmap use different cachelines for CPUs that are on different nodes
might actually help fix the issue.

> In the current /dev/fuse based IO model requests are queued on lists
> that are not core-affine or numa aware. For every request a round trip
> between userspace and kernel is needed.
> When we benchmarked our atomic-open patches (also still WIP) initially
> confusing findings came up [1] and could be tracked down to multiple
> threads reading from /dev/fuse. After switching to a single thread that
> reads from /dev/fuse we got consistent and expected results.
> Later we also figured out that adding a polling spin fuse_dev_do_read()
> before going into a waitq sleep when no request is available greatly
> improved meta data benchmark performance [2].
>
> That made us to think about the current communication and to look into a
> ring based queuing model. Around that time IORING_OP_URING_CMD was added
> to uring and the new userspace block device driver (ublk) is using that
> command, to send requests from kernel to userspace.
> I started to look how ublk works and started to adapt a similar model to
> fuse. State as today is that it is basically working, but I'm still
> fixing issues found by xfstests. Benchmarks and patch cleanup for
> submission follow next.
>
> https://github.com/bsbernd/linux/tree/fuse-uring
> https://github.com/bsbernd/libfuse/tree/uring
> (these branches will _not_ be used for upstream submission, these are
> purely for base development)
>
>
> A fuse design documentation update will also be added in the 1st RFC
> request, basic details follow as
>
> - Initial mount setup goes over /dev/fuse
> - fuse.ko queues FUSE_INIT in the existing /dev/fuse (background) queue
> - User space sets up the ring and all queues with a new ioctl
> - fuse.ko sets up the ring and allocates request queues/request memory
> per queue/request
> - Userspace mmaps these buffers and assigns them per queue/request
> - Data are send through these mmaped buffers, there is no kmap involved
> (difference to ublk)

How is the queue buffer filled?  Are requests packed or is the queue
divided into equal parts for each request?

How replies are sent?  Do they use the same buffer?

> - Similar to ublk user space first submits SQEs with as
> FUSE_URING_REQ_FETCH, then later as FUSE_URING_REQ_COMMIT_AND_FETCH -
> commit results of the current request and fetch the next one.
> - FUSE_URING_REQ_FETCH also takes the FUSE_INIT request, later these
> lists are not checked anymore, as there is nothing supposed to be on them

Which list?  If the FUSE_INIT is handled on /dev/fuse why handle it on
the uring?

> - The ring currently only only handles fuse pending and background
> requests (with credits assigned)
> - Forget requires libfuse still read /dev/fuse (handling will be added
> to the ring later)
> - In the WIP state request interrupts are not supported (yet)
> - Userspace needs to send fuse notifications to /dev/fuse, needs to be
> handled by the ring as well (or maybe a separate ring)
> - My goal was to keep compatibility with existing fuse file systems,
> except of the so far missing interrupt handling that should work so far.

Interrupts and notifications are used by very few fs.  So if it's
easier, then we could leave one thread to handle legacy /dev/fuse
requests for anything that's not performance sensitive.

Thanks,
Miklos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux