On 2/12/20 10:22 AM, Jens Axboe wrote: > On 2/12/20 10:11 AM, Jens Axboe wrote: >> On 2/12/20 9:31 AM, Carter Li 李通洲 wrote: >>> Hi everyone, >>> >>> IOSQE_IO_LINK seems to have very high cost, even greater then io_uring_enter syscall. >>> >>> Test code attached below. The program completes after getting 100000000 cqes. >>> >>> $ gcc test.c -luring -o test0 -g -O3 -DUSE_LINK=0 >>> $ time ./test0 >>> USE_LINK: 0, count: 100000000, submit_count: 1562500 >>> 0.99user 9.99system 0:11.02elapsed 99%CPU (0avgtext+0avgdata 1608maxresident)k >>> 0inputs+0outputs (0major+72minor)pagefaults 0swaps >>> >>> $ gcc test.c -luring -o test1 -g -O3 -DUSE_LINK=1 >>> $ time ./test1 >>> USE_LINK: 1, count: 100000110, submit_count: 799584 >>> 0.83user 19.21system 0:20.90elapsed 95%CPU (0avgtext+0avgdata 1632maxresident)k >>> 0inputs+0outputs (0major+72minor)pagefaults 0swaps >>> >>> As you can see, the `-DUSE_LINK=1` version emits only about half io_uring_submit calls >>> of the other version, but takes twice as long. That makes IOSQE_IO_LINK almost useless, >>> please have a check. >> >> The nop isn't really a good test case, as it doesn't contain any smarts >> in terms of executing a link fast. So it doesn't say a whole lot outside >> of "we could make nop links faster", which is also kind of pointless. >> >> "Normal" commands will work better. Where the link is really a win is if >> the first request needs to go async to complete. For that case, the >> next link can execute directly from that context. This saves an async >> punt for the common case. > > Case in point, if I just add the below patch, we're a lot closer: > > [root@archlinux liburing]# time test/nop-link 0 > Using link: 0 > count: 100000000, submit_count: 1562500 > > > real 0m7.934s > user 0m0.740s > sys 0m7.157s > [root@archlinux liburing]# time test/nop-link 1 > Using link: 1 > count: 100000000, submit_count: 781250 > > > real 0m9.009s > user 0m0.710s > sys 0m8.264s > > The links are still a bit slower, which is to be expected as the > nop basically just completes, it doesn't do anything at all and > it never needs to go async. Pinning the test for more reliable results and we're basically even. [root@archlinux liburing]# time taskset -c 0 test/nop-link 1 Using link: 1 count: 100000000, submit_count: 781250 real 0m8.251s user 0m0.680s sys 0m7.536s [root@archlinux liburing]# time taskset -c 0 test/nop-link 0 Using link: 0 count: 100000000, submit_count: 1562500 real 0m7.986s user 0m0.610s sys 0m7.340s For the intended case (outlined above), it'll definitely be a win. -- Jens Axboe