Re: [ISSUE] The time cost of IOSQE_IO_LINK

Jens Axboe <axboe@xxxxxxxxx> · Wed, 12 Feb 2020 10:29:41 -0700

On 2/12/20 10:22 AM, Jens Axboe wrote:
> On 2/12/20 10:11 AM, Jens Axboe wrote:
>> On 2/12/20 9:31 AM, Carter Li 李通洲 wrote:
>>> Hi everyone,
>>>
>>> IOSQE_IO_LINK seems to have very high cost, even greater then io_uring_enter syscall.
>>>
>>> Test code attached below. The program completes after getting 100000000 cqes.
>>>
>>> $ gcc test.c -luring -o test0 -g -O3 -DUSE_LINK=0
>>> $ time ./test0
>>> USE_LINK: 0, count: 100000000, submit_count: 1562500
>>> 0.99user 9.99system 0:11.02elapsed 99%CPU (0avgtext+0avgdata 1608maxresident)k
>>> 0inputs+0outputs (0major+72minor)pagefaults 0swaps
>>>
>>> $ gcc test.c -luring -o test1 -g -O3 -DUSE_LINK=1
>>> $ time ./test1
>>> USE_LINK: 1, count: 100000110, submit_count: 799584
>>> 0.83user 19.21system 0:20.90elapsed 95%CPU (0avgtext+0avgdata 1632maxresident)k
>>> 0inputs+0outputs (0major+72minor)pagefaults 0swaps
>>>
>>> As you can see, the `-DUSE_LINK=1` version emits only about half io_uring_submit calls
>>> of the other version, but takes twice as long. That makes IOSQE_IO_LINK almost useless,
>>> please have a check.
>>
>> The nop isn't really a good test case, as it doesn't contain any smarts
>> in terms of executing a link fast. So it doesn't say a whole lot outside
>> of "we could make nop links faster", which is also kind of pointless.
>>
>> "Normal" commands will work better. Where the link is really a win is if
>> the first request needs to go async to complete. For that case, the
>> next link can execute directly from that context. This saves an async
>> punt for the common case.
> 
> Case in point, if I just add the below patch, we're a lot closer:
> 
> [root@archlinux liburing]# time test/nop-link 0
> Using link: 0
> count: 100000000, submit_count: 1562500
> 
> 
> real	0m7.934s
> user	0m0.740s
> sys	0m7.157s
> [root@archlinux liburing]# time test/nop-link 1
> Using link: 1
> count: 100000000, submit_count: 781250
> 
> 
> real	0m9.009s
> user	0m0.710s
> sys	0m8.264s
> 
> The links are still a bit slower, which is to be expected as the
> nop basically just completes, it doesn't do anything at all and
> it never needs to go async.

Pinning the test for more reliable results and we're basically even.

[root@archlinux liburing]# time taskset -c 0 test/nop-link 1
Using link: 1
count: 100000000, submit_count: 781250

real	0m8.251s
user	0m0.680s
sys	0m7.536s

[root@archlinux liburing]# time taskset -c 0 test/nop-link 0
Using link: 0
count: 100000000, submit_count: 1562500

real	0m7.986s
user	0m0.610s
sys	0m7.340s

For the intended case (outlined above), it'll definitely be a
win.

-- 
Jens Axboe