On 8/3/21 12:04 PM, Nadav Amit wrote: > > >> On Aug 3, 2021, at 7:37 AM, Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 8/3/21 7:22 AM, Jens Axboe wrote: >>> On 8/2/21 7:05 PM, Nadav Amit wrote: >>>> Hello Jens, >>>> >>>> I encountered an issue, which appears to be a race between >>>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address >>>> this issue and whether I am missing something, since this seems to >>>> occur in a common scenario. Your feedback (or fix ;-)) would be >>>> appreciated. >>>> >>>> I run on 5.13 a workload that issues multiple async read operations >>>> that should run concurrently. Some read operations can not complete >>>> for unbounded time (e.g., read from a pipe that is never written to). >>>> The problem is that occasionally another read operation that should >>>> complete gets stuck. My understanding, based on debugging and the code >>>> is that the following race (or similar) occurs: >>>> >>>> >>>> cpu0 cpu1 >>>> ---- ---- >>>> io_wqe_worker() >>>> schedule_timeout() >>>> // timed out >>>> io_wqe_enqueue() >>>> io_wqe_wake_worker() >>>> // work_flags & IO_WQ_WORK_CONCURRENT >>>> io_wqe_activate_free_worker() >>>> io_worker_exit() >>>> >>>> >>>> Basically, io_wqe_wake_worker() can find a worker, but this worker is >>>> about to exit and is not going to process further work. Once the >>>> worker exits, the concurrency level decreases and async work might be >>>> blocked by another work. I had a look at 5.14, but did not see >>>> anything that might address this issue. >>>> >>>> Am I missing something? >>>> >>>> If not, all my ideas for a solution are either complicated (track >>>> required concurrency-level) or relaxed (span another worker on >>>> io_worker_exit if work_list of unbounded work is not empty). >>>> >>>> As said, feedback would be appreciated. >>> >>> You are right that there's definitely a race here between checking the >>> freelist and finding a worker, but that worker is already exiting. Let >>> me mull over this a bit, I'll post something for you to try later today. >> >> Can you try something like this? Just consider it a first tester, need >> to spend a bit more time on it to ensure we fully close the gap. > > Thanks for the quick response. > > I tried you version. It works better, but my workload still gets stuck > occasionally (less frequently though). It is pretty obvious that the > version you sent still has a race, so I didn’t put the effort into > debugging it. All good, thanks for testing! Is it a test case you can share? Would help with confidence in the final solution. > I should note that I have an ugly hack that does make my test pass. I > include it, although it is obviously not the right solution. Thanks, I'll take a look. -- Jens Axboe