On 4/22/23 15:40, Bernd Schubert wrote: > On 4/22/23 04:13, Jens Axboe wrote: > > Just gave it a quick file creation/removal run with single threaded > bonnie++ > and performance is actually lower than before (around 8000 creates/s > without > IORING_SETUP_SQPOLL (adding IORING_SETUP_IOPOLL doesn't help either) and > about 5000 creates/s with IORING_SETUP_SQPOLL). With plain /dev/fuse > it is about 2000 creates/s. > Main improvement comes from ensuring request submission > (application) and request handling (ring/thread) are on the same core. > I'm running into some scheduler issues, which I work around for now using > migrate_disable()/migrate_enable() in before/after fuse request waitq, > without that performance for metadata requests is similar to plain > /dev/fuse. Btw, I have an idea. For sync requests the initial thread could do the polling, like right now it is: app -> vfs/fuse -> fill ring req -> io_uring_cmd_done -> waitq-wait could become app ->vfs/fuse -> fill ring req -> io_uring_cmd_done -> poll ring for SQ Obviously, it is a bit more complex when there are multiple requests on the same ring. For async it could be app page cached write -> vfs/fuse -> fill ring req, with max 1MB requests -> io_uring_cmd_done -> check if there are completed events -> back to app, possibly next request -> async poll task/thread (similar to SQPOLL, if the ring was not polled for some amount of time) I will investigate if this is feasible once I'm done with other changes. Bernd