Re: RFC fuse waitq latency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[I removed the failing netapp/zufs CCs]

On 4/22/22 14:25, Miklos Szeredi wrote:
On Mon, 28 Mar 2022 at 15:21, Bernd Schubert <bschubert@xxxxxxx> wrote:

I would like to discuss the user thread wake up latency in
fuse_dev_do_read(). Profiling fuse shows there is room for improvement
regarding memory copies and splice. The basic profiling with flame graphs
didn't reveal, though, why fuse is so much
slower (with an overlay file system) than just accessing the underlying
file system directly and also didn't reveal why a single threaded fuse
uses less than 100% cpu, with the application on top of use also using
less than 100% cpu (simple bonnie++ runs with 1B files).
So I started to suspect the wait queues and indeed, keeping the thread
that reads the fuse device for work running for some time gives quite
some improvements.

Might be related: I experimented with wake_up_sync() that didn't meet
my expectations.  See this thread:

https://lore.kernel.org/all/1638780405-38026-1-git-send-email-quic_pragalla@xxxxxxxxxxx/#r

Possibly fuse needs some wake up tweaks due to its special scheduling
requirements.

Thanks I will look at that as well. I have a patch with spinning and avoid of thread wake that is almost complete and in my (still limited) testing almost does not take more CPU and improves meta data / bonnie performance in between factor ~1.9 and 3, depending on in which performance mode the cpu is.

https://github.com/aakefbs/linux/commits/v5.17-fuse-scheduling3

Missing is just another option for wake-queue-size trigger and handling of signals. Should be ready once I'm done with my other work.

That being said, in the mean time I do believe a better approach would be SQ/CQ like, similar to NVME or io_uring. In principle exactly as io_uring, just the other way around - kernel fills in SQ, user space consumes it and fills CQ. We also looked into zufs and your fuse2 branch and were almost ready to start to port it to a recent kernel, but it is still all systemcall based and has waitq's - probably much slower than what could be achieved through queue pairs. Assuming userspace would not want a polling thread, but would want a notification similar to io_uring_enter(), there would be still a thread needed to be woken up, may that is where wake_up_sync() would help.

Btw, the optional kernel polling thread in io_uring also has spinning...


Bernd



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux