On 7/1/21 2:45 PM, Jens Axboe wrote: > On 7/1/21 6:26 AM, Pavel Begunkov wrote: >> If one entered io_req_task_work_add() not seeing PF_EXITING, it will set >> a ->task_state bit and try task_work_add(), which may fail by that >> moment. If that happens the function would try to cancel the request. >> >> However, in a meanwhile there might come other io_req_task_work_add() >> callers, which will see the bit set and leave their requests in the >> list, which will never be executed. >> >> Don't propagate an error, but clear the bit first and then fallback >> all requests that we can splice from the list. The callback functions >> have to be able to deal with PF_EXITING, so poll and apoll was modified >> via changing io_poll_rewait(). >> >> Reported-by: Jens Axboe <axboe@xxxxxxxxx> >> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> >> --- >> >> Jens, can you try if it helps with the leak you meantioned? I can't >> see it. As with previous, would need to remove the PF_EXITING check, >> and should be in theory safe to do. > > Probably misunderstanding you here, but you already killed the one that > patch 3 remove. In any case, I tested this on top of 1+2, and I don't > see any leaks at that point. I believe removal of the PF_EXITING check yesterday didn't create a new bug, but made the one addressed here much more likely to happen. And so it fixes it, regardless of PF_EXITING. For the PF_EXITING removal, let's postpone it for-next. -- Pavel Begunkov