Hi, In chatting with someone that was trying to use io_uring to read mailddirs, they found that running a test case that does: open file, statx file, read file, close file The culprit here is statx, and argumentation aside on whether it makes sense to statx in the first place, it does highlight that io-wq is pretty locking intensive. This (very lightly tested [1]) patchset attempts to improve this situation, but reducing the frequency of grabbing wq->lock and acct->lock. The first patch gets rid of wq->lock on work insertion. io-wq grabs it to iterate the free worker list, but that is not necessary. Second patch reduces the frequency of acct->lock grabs, when we need to run the queue and process new work. We currently grab the lock and check for work, then drop it, then grab it again to process the work. That is unneccessary. Final patch just optimizes how we activate new workers. It's not related to the locking itself, just reducing the overhead of activating a new worker. Running the above test case on a directory with 50K files, each being between 10 and 4096 bytes, before these patches we get spend 160-170ms running the workload. With this patchset, we spend 90-100ms doing the same work. A bit of profile information is included in the patch commit messages. Can also be found here: https://git.kernel.dk/cgit/linux/log/?h=io_uring-wq-lock [1] Runs the test suite just fine, with PROVE_LOCKING enabled and raw lockdep as well. -- Jens Axboe