On 7/1/21 8:19 AM, Pavel Begunkov wrote: > On 7/1/21 2:45 PM, Jens Axboe wrote: >> On 7/1/21 6:26 AM, Pavel Begunkov wrote: >>> If one entered io_req_task_work_add() not seeing PF_EXITING, it will set >>> a ->task_state bit and try task_work_add(), which may fail by that >>> moment. If that happens the function would try to cancel the request. >>> >>> However, in a meanwhile there might come other io_req_task_work_add() >>> callers, which will see the bit set and leave their requests in the >>> list, which will never be executed. >>> >>> Don't propagate an error, but clear the bit first and then fallback >>> all requests that we can splice from the list. The callback functions >>> have to be able to deal with PF_EXITING, so poll and apoll was modified >>> via changing io_poll_rewait(). >>> >>> Reported-by: Jens Axboe <axboe@xxxxxxxxx> >>> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> >>> --- >>> >>> Jens, can you try if it helps with the leak you meantioned? I can't >>> see it. As with previous, would need to remove the PF_EXITING check, >>> and should be in theory safe to do. >> >> Probably misunderstanding you here, but you already killed the one that >> patch 3 remove. In any case, I tested this on top of 1+2, and I don't >> see any leaks at that point. > > I believe removal of the PF_EXITING check yesterday didn't create > a new bug, but made the one addressed here much more likely to > happen. And so it fixes it, regardless of PF_EXITING. That's what it looks like, yes. > For the PF_EXITING removal, let's postpone it for-next. Agree, no rush on that one. -- Jens Axboe