Re: [PATCH] ublk_drv: don't call task_work_add for queueing io commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022/10/26 19:29, Ming Lei wrote:

[...]
>>
>> Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz 16 cores
>> 64GB MEM, CentOS 8, kernel 6.0+
>> with IORING_SETUP_COOP_TASKRUN, without this kernel patch
>>
>> ucmd: io_uring_cmd_complete_in_task(), ublk_drv is a module
>>
>> ucmd-not-touch-pdu: use llist && do not touch 'cmd'/'pdu'/'io' in ublk_queue_rq()
>>
>> tw: task_work_add(), ublk is built-in.
>>
>>
>> --------
>> fio test
>>
>> iodepth=128 numjobs=1 direct=1 bs=4k
>>
>> --------------------------------------------
>> ublk loop target, the backend is a file.
>>
>> IOPS(k)
>>
>> type		ucmd		tw		ucmd-not-touch-pdu
>> seq-read	54.1		53.8		53.6
>> rand-read	52.0		52.0		52.0
>>
>> --------------------------------------------
>> ublk null target
>> IOPS(k)
>>
>> type		ucmd		tw		ucmd-not-touch-pdu
>> seq-read	272		286		275
>> rand-read	262		278		269
>>
>>
>> ------------
>> ublksrv test
>>
>> -------------
>> ucmd
>>
>> running loop/001
>>         fio (ublk/loop( -f /root/work/ubdsrv/tests/tmp/ublk_loop_1G_BZ85U), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 66737
>>         randread: jobs 1, iops 64935
>>         randrw: jobs 1, iops read 32694 write 32710
>>         rw(512k): jobs 1, iops read 772 write 819
>>
>> running null/001
>>         fio (ublk/null(), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 715863
>>         randread: jobs 1, iops 758449
>>         randrw: jobs 1, iops read 357407 write 357183
>>         rw(512k): jobs 1, iops read 5895 write 5875
>>
>> -------------
>> tw
>>
>> running loop/001
>>         fio (ublk/loop( -f /root/work/ubdsrv/tests/tmp/ublk_loop_1G_pvLTL), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 66856
>>         randread: jobs 1, iops 65015
>>         randrw: jobs 1, iops read 32751 write 32767
>>         rw(512k): jobs 1, iops read 776 write 823
>>
>> running null/001
>>         fio (ublk/null(), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 739450
>>         randread: jobs 1, iops 787500
>>         randrw: jobs 1, iops read 372956 write 372831
>>         rw(512k): jobs 1, iops read 5798 write 5777
>>
>> -------------
>> ucmd-not-touch-pdu
>>
>> running loop/001
>>         fio (ublk/loop( -f /root/work/ubdsrv/tests/tmp/ublk_loop_1G_oH0eG), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 66754
>>         randread: jobs 1, iops 65032
>>         randrw: jobs 1, iops read 32776 write 32792
>>         rw(512k): jobs 1, iops read 772 write 818
>>
>> running null/001
>>         fio (ublk/null(), libaio, bs 4k, dio, hw queues:1, uring_comp: 0, get_data: 0)...
>>         randwrite: jobs 1, iops 725334
>>         randread: jobs 1, iops 741105
>>         randrw: jobs 1, iops read 360285 write 360047
>>         rw(512k): jobs 1, iops read 5770 write 5748
>>
>> Not touching cmd/pdu/io in ublk_queue_rq() improves IOPS.
>> But it is worse than using task_work_add().
> 
> Thanks for the test! It is better to not share ucmd between
> ublk blk-mq io context and ubq daemon context, and we can
> improve it for using io_uring_cmd_complete_in_task(), and I
> have one patch by using the batch handing approach in
> io_uring_cmd_complete_in_task().
> 
> Another reason could be the extra __set_notify_signal() in
> __io_req_task_work_add() via task_work_add(). When task_work_add()
> is available, we just need to call __set_notify_signal() once
> for the whole batch, but it can't be done in case of using
> io_uring_cmd_complete_in_task().
> 
> Also the patch of 'use llist' is actually wrong since we have to
> call io_uring_cmd_complete_in_task() once in ->commit_rqs(), but
> that couldn't be easy because ucmd isn't available at that time.

Yes, you are correct.

> 
> I think we may have to live with task_work_add() until the perf
> number is improved to same basically with io_uring_cmd_complete_in_task().

OK, we can keep task_work_add() && io_uring_cmd_complete_in_task().
BTW, from test:loop/001, I think with real backends, the performance
gap between them seems not too big.

Regards,
Zhang




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux