Re: Should io_sq_thread belongs to specific cpu, not io_uring instance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi,


On 4/14/20 9:08 AM, Xiaoguang Wang wrote:
hi,

Currently we can create multiple io_uring instances which all have SQPOLL
enabled and make them run in the same cpu core by setting sq_thread_cpu
argument, but I think this behaviour maybe not efficient. Say we create two
io_uring instances, which both have sq_thread_cpu set to 1 and sq_thread_idle
set to 1000 milliseconds, there maybe such scene below:
   For example, in 0-1s time interval, io_uring instance0 has neither sqes
nor cqes, so it just busy waits for new sqes in 0-1s time interval, but
io_uring instance1 have work to do, submitting sqes or polling issued requests,
then io_uring instance0 will impact io_uring instance1. Of cource io_uring
instance1 may impact iouring instance0 as well, which is not efficient. I think
the complete disorder of multiple io_uring instances running in same cpu core is
not good.

How about we create one io_sq_thread for user specified cpu for multiple io_uring
instances which try to share this cpu core, that means this io_sq_thread does not
belong to specific io_uring instance, it belongs to specific cpu and will
handle requests from mulpile io_uring instance, see simple running flow:
   1, for cpu 1, now there are no io_uring instances bind to it, so do not create io_sq_thread
   2, io_uring instance1 is created and bind to cpu 1, then create cpu1's io_sq_thread
   3, io_sq_thread will handle io_uring instance1's requests
   4, io_uring instance2 is created and bind to cpu 1, since there are already an
      io_sq_thread for cpu 1, will not create an io_sq_thread for cpu1.
   5. now io_sq_thread in cpu1 will handle both io_uring instances' requests.

What do you think about it? Thanks.

Regards,
Xiaoguang Wang

Hi Xiaoguang,

We (a group of researchers at Utah and Columbia) are currently trying that right now.
Cool, thanks, let me explain more why we need this feature :)
Cpu is a much more important resource. Say a physical machine has 96 cores,
if we run many io_uring instances which all have sqpoll enabled, indeed we
can only allocate a small number of cpus to io_sq_thread, so sharing cpu to
poll is valuable.


We have an initial prototype going, and we are assessing the performance impact now to see if we can see gains. Basically, have a rcu-list of io_uring_ctx and then traverse the list and do work in a shared io_sq_thread. We are starting experiments on a machine with fast SSDs where we hope to see some performance benefits.
You can try this test case to assessing the performance :)
  1. create two io_uring instances, which both have sqpoll enabled, set
sq_thread_idle to 1000ms and bind to same cpu core.
  2. one io_uring instance just sends one io request per 500ms, which will
make this instance's io_sq_thead always contend for the cpu.
  3. another io_uring instance issues io requests continually, so this
instance's io_sq_thread will also contend for the cpu.
In current io_uring implementation, I think the second io_uring instance will
be impacted by the first io_uring instance.


We will send the list of patches soon, once we are sure the approach works and we finish cleaning it up. (There is a subtlety of what to do with the timeouts and resched() when not pinning.)

We'll keep you in the loop on any updates. Feel free to contact any of us.
OK, thanks.

Regards,
Xiaoguang Wang

Thanks,

Yu Jian Wu




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux