On Mon, 28 Mar 2022 at 15:21, Bernd Schubert <bschubert@xxxxxxx> wrote: > > I would like to discuss the user thread wake up latency in > fuse_dev_do_read(). Profiling fuse shows there is room for improvement > regarding memory copies and splice. The basic profiling with flame graphs > didn't reveal, though, why fuse is so much > slower (with an overlay file system) than just accessing the underlying > file system directly and also didn't reveal why a single threaded fuse > uses less than 100% cpu, with the application on top of use also using > less than 100% cpu (simple bonnie++ runs with 1B files). > So I started to suspect the wait queues and indeed, keeping the thread > that reads the fuse device for work running for some time gives quite > some improvements. Might be related: I experimented with wake_up_sync() that didn't meet my expectations. See this thread: https://lore.kernel.org/all/1638780405-38026-1-git-send-email-quic_pragalla@xxxxxxxxxxx/#r Possibly fuse needs some wake up tweaks due to its special scheduling requirements. Thanks, Miklos