On Mon, Aug 29, 2022 at 02:13:12PM +0800, Ziyang Zhang wrote: > On 2022/8/29 13:40, Ming Lei wrote: > > On Wed, Aug 24, 2022 at 01:47:39PM +0800, ZiyangZhang wrote: > >> If one rq is handled by io_uring_cmd_complete_in_task(), after a crash > >> this rq is actually handled by an io_uring fallback wq. We have to > >> end(abort) this rq since this fallback wq is a task other than the > >> crashed task. However, current code does not call io_uring_cmd_done() > >> at the same time but do it in ublk_cancel_queue(). With current design, > >> this does work because ublk_cancel_queue() is called AFTER del_gendisk(), > >> which waits for the rq ended(aborted) in fallback wq. This implies that > >> fallback wq on this rq is scheduled BEFORE calling io_uring_cmd_done() > >> on the corresponding ioucmd in ublk_cancel_queue(). > > > > Right. > > > >> > >> However, while considering recovery feature, we cannot rely on > >> del_gendisk() or blk_mq_freeze_queue() to wait for completion of all > >> rqs because we may not want any aborted rq. Besides, io_uring does not > >> provide "flush fallback" machenism so we cannot trace this ioucmd. > > > > Why not? > > > > If user recovery is enabled, del_gendisk() can be replaced with > > blk_mq_quiesce_queue(), then let abort work function do: > > > > - cancel all in-flight requests by holding them into requeue list > > instead of finishing them as before, and this way is safe because > > abort worker does know the ubq daemon is dying > > - cancel pending commands as before, because the situation is same > > with disk deleted or queue frozen > > The problem is: we cannot control when fallback wq is scheduled. > So we are unsafe to call io_uring_cmd_done() in another process. What is the other process? It can't be fallback wq since any ublk request is aborted at the beginning of __ublk_rq_task_work(). It shouldn't be the process calling ublk_cancel_dev(), since it is safe to call io_uring_cmd_done() if ubq->nr_io_ready > 0. Or others? > Otherwise, there is a UAF, just as > (5804987b7272f437299011c76b7363b8df6f8515: ublk_drv: do not add a > re-issued request aborted previously to ioucmd's task_work). As I mentioned, del_gendisk() can be replaced with blk_mq_quiesce_queue() in case of user recovery, then no any new request can be queued after blk_mq_quiesce_queue() returns. > > Yeah I know the answer is very simple: flush the fallback wq. > But here are two more questions: I don't see why we need to flush fallback wq, care to provide some details? > > (1) Should ublk_drv rely on the fallback wq machenism? > IMO, ublk_drv should not know detail of io_uring_cmd_complete_in_task() > because its implementation may change in the future. > BTW, I think current ublk_rq_task_work_cb() is not correct because > it does not always call io_uring_cmd_done() before returning. > nvme_uring_cmd_end_io() always calls io_uring_cmd_done() for each ioucmd > no matter the rq succeeds or fails. > > (2) Suppose io_uring does export the symbol 'flush_fallback_work', should we call > it before starting a new process(recovery)? > What if fallback wq is not scheduled immediately if there are many processes > running and the system overhead is heavy. In this case the recovery process > may wait for too long. Really we should not depend on fallback wq and please > let the fallback wq complete the ioucmd itself. > > > > > With this way, the current abort logic won't be changed much. > > > > And user recovery should only be started _after_ ublk device is found > > as aborted. > > START_RECOVERY will check if all ubq_daemons(the process) are PF_EXITING. That is different. If START_RECOVERY is only run on aborted device, the recovery handler could be simplified. > > > > >> > >> The recovery machenism needs to complete all ioucmds of a dying ubq > >> to avoid leaking io_uring ctx. But as talked above, we are unsafe > >> to call io_uring_cmd_done() in the recovery task if fallback wq happens > >> to run simultaneously. This is a UAF case because io_uring ctx may be > >> freed. Actually a similar case happens in > >> (5804987b7272f437299011c76b7363b8df6f8515: ublk_drv: do not add a > >> re-issued request aborted previously to ioucmd's task_work). > > > > If you take the above approach, I guess there isn't such problem because > > abort can handle the case well as before. > > Ming, we did think this approach(quiesce, requeue rq/complete ioucmd) > at the very beginning. But we decided to drop it because we don not want > rely on 'flush fallback wq' machenism, which > makes ublk_drv rely on io_uring's internal implementation. Then the focus is 'flush fallback wq', please see my above question. Thanks, Ming