Re: Race between io_wqe_worker() and io_wqe_wake_worker()

Jens Axboe <axboe@xxxxxxxxx> · Tue, 3 Aug 2021 12:14:37 -0600

On 8/3/21 12:04 PM, Nadav Amit wrote:
> 
> 
>> On Aug 3, 2021, at 7:37 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On 8/3/21 7:22 AM, Jens Axboe wrote:
>>> On 8/2/21 7:05 PM, Nadav Amit wrote:
>>>> Hello Jens,
>>>>
>>>> I encountered an issue, which appears to be a race between
>>>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
>>>> this issue and whether I am missing something, since this seems to
>>>> occur in a common scenario. Your feedback (or fix ;-)) would be
>>>> appreciated.
>>>>
>>>> I run on 5.13 a workload that issues multiple async read operations
>>>> that should run concurrently. Some read operations can not complete
>>>> for unbounded time (e.g., read from a pipe that is never written to).
>>>> The problem is that occasionally another read operation that should
>>>> complete gets stuck. My understanding, based on debugging and the code
>>>> is that the following race (or similar) occurs:
>>>>
>>>>
>>>>  cpu0					cpu1
>>>>  ----					----
>>>> 					io_wqe_worker()
>>>> 					 schedule_timeout()
>>>> 					 // timed out
>>>>  io_wqe_enqueue()
>>>>   io_wqe_wake_worker()
>>>>    // work_flags & IO_WQ_WORK_CONCURRENT
>>>>    io_wqe_activate_free_worker()
>>>> 					 io_worker_exit()
>>>>
>>>>
>>>> Basically, io_wqe_wake_worker() can find a worker, but this worker is
>>>> about to exit and is not going to process further work. Once the
>>>> worker exits, the concurrency level decreases and async work might be
>>>> blocked by another work. I had a look at 5.14, but did not see
>>>> anything that might address this issue.
>>>>
>>>> Am I missing something?
>>>>
>>>> If not, all my ideas for a solution are either complicated (track
>>>> required concurrency-level) or relaxed (span another worker on
>>>> io_worker_exit if work_list of unbounded work is not empty).
>>>>
>>>> As said, feedback would be appreciated.
>>>
>>> You are right that there's definitely a race here between checking the
>>> freelist and finding a worker, but that worker is already exiting. Let
>>> me mull over this a bit, I'll post something for you to try later today.
>>
>> Can you try something like this? Just consider it a first tester, need
>> to spend a bit more time on it to ensure we fully close the gap.
> 
> Thanks for the quick response.
> 
> I tried you version. It works better, but my workload still gets stuck
> occasionally (less frequently though). It is pretty obvious that the
> version you sent still has a race, so I didn’t put the effort into
> debugging it.

All good, thanks for testing! Is it a test case you can share? Would
help with confidence in the final solution.

> I should note that I have an ugly hack that does make my test pass. I
> include it, although it is obviously not the right solution.

Thanks, I'll take a look.

-- 
Jens Axboe