On 10/9/21 1:56 AM, Xiaoguang Wang wrote: > Recently I spend time to research io_uring's fast-poll and multi-shot's > performance using network echo-server model. Previously I always thought > fast-poll is better than multi-shot and will give better performance, > but indeed multi-shot is almost always better than fast-poll in real > test, which is very interesting. I use ebpf to have some measurements, > it shows that whether fast-poll is excellent or not depends entirely on > that the first nowait try in io_issue_sqe() succeeds or fails. Take > io_recv operation as example(recv buffer is 16 bytes): > 1) the first nowait succeeds, a simple io_recv() is enough. > In my test machine, successful io_recv() consumes 1110ns averagely. > > 2) the first nowait fails, then we'll have some expensive work, which > contains failed io_revc(), apoll allocations, vfs_poll(), miscellaneous > initializations anc check in __io_arm_poll_handler() and a final > successful io_recv(). Among then: > failed io_revc() consumes 620ns averagely. > vfs_poll() consumes 550ns averagely. > I don't measure other overhead yet, but we can see if the first nowait > try fails, we'll need at least 2290ns(620 + 550 + 1110) to complete it. > In my echo server tests, 40% of first nowait io_recv() operations fails. > > From above measurements, it can explain why mulit-shot is better than > multi-shot, mulit-shot can ensure the first nowait try succeed. > > Based on above measurements, I try to improve fast-poll a bit: > 1. introduce fix poll support, currently it only works in file > registered mode. With this feature, we can get rid of various repeated > operations in io_arm_poll_handler(), contains apoll allocations, > and miscellaneous initializations anc check. > 2. introduce an event generation, which will increase monotonically. > If there is no new event happen, we don't need to call vfs_poll(), just > put req in a waitting list. This also needs a respin, and can you split it into two patches? -- Jens Axboe