On 8/17/20 4:46 AM, Dmitry Shulyak wrote: > Hi everyone, > > I noticed in iotop that all writes are executed by the same thread > (io_wqe_worker-0). This is a significant problem if I am using files > with mentioned flags. Not the case with reads, requests are > multiplexed over many threads (note the different name > io_wqe_worker-1). The problem is not specific to O_SYNC, in the > general case I can get higher throughput with thread pool and regular > system calls, but specifically with O_SYNC the throughput is the same > as if I were using a single thread for writing. > > The setup is always the same, ring per thread with shared workers pool > (IORING_SETUP_ATTACH_WQ), and high submission rate. Also, it is > possible to get around this performance issue by using separate worker > pools, but then I have to load balance workload between many rings for > perf gains. > > I thought that it may have something to do with the IOSQE_ASYNC flag, > but setting it had no effect. > > Is it expected behavior? Are there any other solutions, except > creating many rings with isolated worker pools? This is done on purpose, as buffered writes end up being serialized on the inode mutex anyway. So if you spread the load over multiple workers, you generally just waste resources. In detail, writes to the same inode are serialized by io-wq, it doesn't attempt to run them in parallel. What kind of performance are you seeing with io_uring vs your own thread pool that doesn't serialize writes? On what fs and what kind of storage? -- Jens Axboe