Re: Should io_sq_thread belongs to specific cpu, not io_uring instance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/14/20 9:08 AM, Xiaoguang Wang wrote:
> hi,
>
> Currently we can create multiple io_uring instances which all have SQPOLL
> enabled and make them run in the same cpu core by setting sq_thread_cpu
> argument, but I think this behaviour maybe not efficient. Say we create two
> io_uring instances, which both have sq_thread_cpu set to 1 and sq_thread_idle
> set to 1000 milliseconds, there maybe such scene below:
>   For example, in 0-1s time interval, io_uring instance0 has neither sqes
> nor cqes, so it just busy waits for new sqes in 0-1s time interval, but
> io_uring instance1 have work to do, submitting sqes or polling issued requests,
> then io_uring instance0 will impact io_uring instance1. Of cource io_uring
> instance1 may impact iouring instance0 as well, which is not efficient. I think
> the complete disorder of multiple io_uring instances running in same cpu core is
> not good.
>
> How about we create one io_sq_thread for user specified cpu for multiple io_uring
> instances which try to share this cpu core, that means this io_sq_thread does not
> belong to specific io_uring instance, it belongs to specific cpu and will
> handle requests from mulpile io_uring instance, see simple running flow:
>   1, for cpu 1, now there are no io_uring instances bind to it, so do not create io_sq_thread
>   2, io_uring instance1 is created and bind to cpu 1, then create cpu1's io_sq_thread
>   3, io_sq_thread will handle io_uring instance1's requests
>   4, io_uring instance2 is created and bind to cpu 1, since there are already an
>      io_sq_thread for cpu 1, will not create an io_sq_thread for cpu1.
>   5. now io_sq_thread in cpu1 will handle both io_uring instances' requests.
>
> What do you think about it? Thanks.
>
> Regards,
> Xiaoguang Wang
>
Hi Xiaoguang,

We (a group of researchers at Utah and Columbia) are currently trying that right now.

We have an initial prototype going, and we are assessing the performance impact now to see if we can see gains. Basically, have a rcu-list of io_uring_ctx and then traverse the list and do work in a shared io_sq_thread. We are starting experiments on a machine with fast SSDs where we hope to see some performance benefits.

We will send the list of patches soon, once we are sure the approach works and we finish cleaning it up. (There is a subtlety of what to do with the timeouts and resched() when not pinning.)

We'll keep you in the loop on any updates. Feel free to contact any of us.

Thanks,

Yu Jian Wu




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux