On 12/15/20 11:29 AM, Linus Torvalds wrote: > On Tue, Dec 15, 2020 at 8:08 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> OK, ran some numbers. The test app benchmarks opening X files, I just >> used /usr on my test box. That's 182677 files. To mimic real worldy >> kind of setups, 33% of the files can be looked up hot, so LOOKUP_NONBLOCK >> will succeed. > > Perhaps more interestingly, what's the difference between the patchset > as posted for just io_uring? > > IOW, does the synchronous LOOKUP_NONBLOCK actually help? > > I'm obviously a big believer in the whole "avoid thread setup costs if > not necessary", so I'd _expect_ it to help, but maybe the possible > extra parallelism is enough to overcome the thread setup and > synchronization costs even for a fast cached RCU lookup. For basically all cases on the io_uring side where I've ended up being able to do the hot/fast path inline, it's been a nice win. The only real exception to that rule is buffered reads that are fully cached, and having multiple async workers copy the data is obviously always going to be faster at some point due to the extra parallelism and memory bandwidth. So yes, I too am a big believer in being able to perform operations inline if at all possible, even if for some thing it turns into a full retry when we fail. The hot path more than makes up for it. > (I also suspect the reality is often much closer to 100% cached > lookups than just 33%, but who knows - there are things like just > concurrent renames that can cause the RCU lookup to fail even if it > _was_ cached, so it's not purely about whether things are in the > dcache or not). In usecs again, same test, this time just using io_uring: Cached 5.10-git 5.10-git+LOOKUP_NONBLOCK -------------------------------------------------------- 33% 1,014,975 900,474 100% 435,636 151,475 As expected, the closer we get to fully cached, the better off we are with using LOOKUP_NONBLOCK. It's a nice win even at just 33% cached. -- Jens Axboe