On 10/27/20 8:47 PM, Zhang, Qiang wrote: > > > ________________________________________ > 发件人: Jens Axboe <axboe@xxxxxxxxx> > 发送时间: 2020年10月27日 21:35 > 收件人: Zhang, Qiang > 抄送: io-uring@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > 主题: Re: [PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout > > On 10/26/20 9:09 PM, qiang.zhang@xxxxxxxxxxxxx wrote: >> From: Zqiang <qiang.zhang@xxxxxxxxxxxxx> >> >> In 'io_wqe_worker' thread, if the work which in 'wqe->work_list' be >> finished, the 'wqe->work_list' is empty, and after that the >> '__io_worker_idle' func return false, the task state is TASK_RUNNING, >> need to be set TASK_INTERRUPTIBLE before call schedule_timeout func. >> >> I don't think that's safe - what if someone added work right before you >> call schedule_timeout_interruptible? Something ala: >> >> >> io_wq_enqueue() >> set_current_state(TASK_INTERRUPTIBLE(); >> schedule_timeout(WORKER_IDLE_TIMEOUT); >> >> then we'll have work added and the task state set to running, but the >> worker itself just sets us to non-running and will hence wait >> WORKER_IDLE_TIMEOUT before the work is processed. >> >> The current situation will do one extra loop for this case, as the >> schedule_timeout() just ends up being a nop and we go around again > > although the worker task state is running, due to the call > schedule_timeout, the current worker still possible to be switched > out. if set current worker task is no-running, the current worker be > switched out, but the schedule will call io_wq_worker_sleeping func > to wake up free worker task, if wqe->free_list is not empty. It'll only be swapped out for TASK_RUNNING if we should be running other work, which would happen on next need-resched event anyway. And the miss you're describing is an expensive one, as it entails creating a new thread and switching to that. That's not a great way to handle a race. So I'm a bit puzzled here - yes we'll do an extra loop and check for the dropping of mm, but that's really minor. The solution is a _lot_ more expensive for hitting the race of needing a new worker, but missing it because you unconditionally set the task to non-running. On top of that, it's also not the idiomatic way to wait for events, which is typically: is event true, break if so set_current_state(TASK_INTERRUPTIBLE); event comes in, task set runnable check again, schedule doesn't schedule, since we were set runnable or variants thereof, using waitqueues. So while I'm of course not opposed to fixing the io-wq loop so that we don't do that last loop when going idle, a) it basically doesn't matter, and b) the proposed solution is much worse. If there was a more elegant solution without worse side effects, then we can discuss that. -- Jens Axboe