On 6/18/21 4:23 PM, Jens Axboe wrote: > On 6/17/21 11:14 AM, Pavel Begunkov wrote: >> If task_state is cleared, io_req_task_work_add() will go the slow path >> adding a task_work, setting the task_state, waking up the task and so >> on. Not to mention it's expensive. tctx_task_work() first clears the >> state and then executes all the work items queued, so if any of them >> resubmits or adds new task_work items, it would unnecessarily go through >> the slow path of io_req_task_work_add(). >> >> Let's clear the ->task_state at the end. We still have to check >> ->task_list for emptiness afterward to synchronise with >> io_req_task_work_add(), do that, and set the state back if we're going >> to retry, because clearing not-ours task_state on the next iteration >> would be buggy. > > Are we not re-introducing the problem fixed by 1d5f360dd1a3c by swapping > these around? if (wq_list_empty(&tctx->task_list)) { clear_bit(0, &tctx->task_state); if (wq_list_empty(&tctx->task_list)) break; ... // goto repeat } Note wq_list_empty() right after clear_bit(), it will retry splicing the list as it currently does. There is some more for restoring the bit back, but should be fine as well. Alternatively, could have been as below, but found it uglier: bool cleared = false; ... if (wq_list_empty(&tctx->task_list)) { if (cleared) break; cleared = true; clear_bit(0, &tctx->task_state); if (wq_list_empty(&tctx->task_list)) break; goto repeat; } -- Pavel Begunkov