On Fri, 22 Apr 2022 at 17:46, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > [I removed the failing netapp/zufs CCs] > > On 4/22/22 14:25, Miklos Szeredi wrote: > > On Mon, 28 Mar 2022 at 15:21, Bernd Schubert <bschubert@xxxxxxx> wrote: > >> > >> I would like to discuss the user thread wake up latency in > >> fuse_dev_do_read(). Profiling fuse shows there is room for improvement > >> regarding memory copies and splice. The basic profiling with flame graphs > >> didn't reveal, though, why fuse is so much > >> slower (with an overlay file system) than just accessing the underlying > >> file system directly and also didn't reveal why a single threaded fuse > >> uses less than 100% cpu, with the application on top of use also using > >> less than 100% cpu (simple bonnie++ runs with 1B files). > >> So I started to suspect the wait queues and indeed, keeping the thread > >> that reads the fuse device for work running for some time gives quite > >> some improvements. > > > > Might be related: I experimented with wake_up_sync() that didn't meet > > my expectations. See this thread: > > > > https://lore.kernel.org/all/1638780405-38026-1-git-send-email-quic_pragalla@xxxxxxxxxxx/#r > > > > Possibly fuse needs some wake up tweaks due to its special scheduling > > requirements. > > Thanks I will look at that as well. I have a patch with spinning and > avoid of thread wake that is almost complete and in my (still limited) > testing almost does not take more CPU and improves meta data / bonnie > performance in between factor ~1.9 and 3, depending on in which > performance mode the cpu is. > > https://github.com/aakefbs/linux/commits/v5.17-fuse-scheduling3 > > Missing is just another option for wake-queue-size trigger and handling > of signals. Should be ready once I'm done with my other work. Trying to understand what is being optimized here... does the following correctly describe your use case? - an I/O thread is submitting synchronous requests (direct I/O?) - the fuse thread always goes to sleep, because the request queue is empty (there's always a single request on the queue) - with this change the fuse thread spins for a jiffy before going to sleep, and by that time the I/O thread will submit a new sync request. - the I/O thread does not spin while the the fuse thread is processing the request, so it still goes to sleep. Thanks, Miklos