On 24/02/2021 02:51, Jens Axboe wrote: >>>>> On 08/02/2021 13:35, Pavel Begunkov wrote: >>>>>> On 08/02/2021 02:50, Xiaoguang Wang wrote: >>>>>>>>> The io_identity's count is underflowed. It's because in io_put_identity, >>>>>>>>> first argument tctx comes from req->task->io_uring, the second argument >>>>>>>>> comes from the task context that calls io_req_init_async, so the compare >>>>>>>>> in io_put_identity maybe meaningless. See below case: >>>>>>>>> task context A issue one polled req, then req->task = A. >>>>>>>>> task context B do iopoll, above req returns with EAGAIN error. >>>>>>>>> task context B re-issue req, call io_queue_async_work for req. >>>>>>>>> req->task->io_uring will set to task context B's identity, or cow new one. >>>>>>>>> then for above case, in io_put_identity(), the compare is meaningless. >>>>>>>>> >>>>>>>>> IIUC, req->task should indicates the initial task context that issues req, >>>>>>>>> then if it gets EAGAIN error, we'll call io_prep_async_work() in req->task >>>>>>>>> context, but iopoll reqs seems special, they maybe issued successfully and >>>>>>>>> got re-issued in other task context because of EAGAIN error. >>>>>>>> >>>>>>>> Looks as you say, but the patch doesn't solve the issue completely. >>>>>>>> 1. We must not do io_queue_async_work() under a different task context, >>>>>>>> because of it potentially uses a different set of resources. So, I just >>>>>>>> thought that it would be better to punt it to the right task context >>>>>>>> via task_work. But... >>>>>>>> >>>>>>>> 2. ...iovec import from io_resubmit_prep() might happen after submit ends, >>>>>>>> i.e. when iovec was freed in userspace. And that's not great at all. >>>>>>> Yes, agree, that's why I say we neeed to re-consider the io identity codes >>>>>>> more in commit message :) I'll have a try to prepare a better one. >>>>>> >>>>>> I'd vote for dragging -AGAIN'ed reqs that don't need io_import_iovec() >>>>>> through task_work for resubmission, and fail everything else. Not great, >>>>>> but imho better than always setting async_data. >>>>> >>>>> Hey Xiaoguang, are you working on this? I would like to leave it to you, >>>>> If you do. >>>> Sorry, currently I'm busy with other project and don't have much time to work on >>>> it yet. Hao Xu will help to continue work on the new version patch. >>> >>> Is it issue or reissue? I found this one today: >>> >>> https://lore.kernel.org/io-uring/c9f6e1f6-ff82-0e58-ab66-956d0cde30ff@xxxxxxxxx/ >> Yeah, my initial patch is similar to yours, but it only solves the bug described >> in my commit message partially(ctx is dying), you can have a look at my commit message >> for the bug bug scene, thanks. > > Are you sure? We just don't want to reissue it, we need to fail it. > Hence if we catch it at reissue time, that should be enough. But I'm > open to clue batting :-) Jens, IOPOLL can happen from a different task, so 1) we don't want to grab io_wq_work context from it. As always we can pass it through task_work, or should be solved with your io-wq patches. 2) it happens who knows when in time, so iovec may be gone already -- same reasoning why io_[read,write]() copy it before going to io-wq. -- Pavel Begunkov