On 2021/12/24 15:03, Christoph Hellwig wrote: > On Thu, Dec 23, 2021 at 11:37:21PM +0900, Tetsuo Handa wrote: >>> @@ -1115,7 +1107,6 @@ static void __loop_clr_fd(struct loop_device *lo) >>> /* freeze request queue during the transition */ >>> blk_mq_freeze_queue(lo->lo_queue); >>> >>> - destroy_workqueue(lo->workqueue); >> >> is it safe to remove destroy_workqueue() call here? >> >>> spin_lock_irq(&lo->lo_work_lock); >>> list_for_each_entry_safe(worker, pos, &lo->idle_worker_list, >>> idle_list) { >> >> destroy_workqueue() implies flush_workqueue() which is creating the lock >> ordering problem. And I think that flush_workqueue() is required for making >> sure that there is no more work to process (i.e. loop_process_work() is >> no longer running) before start deleting idle workers. >> >> My understanding is that the problem is not the use of a per-device workqueue >> but the need to call flush_workqueue() in order to make sure that all pending >> works are completed. > > All the work items are for requests, and the blk_mq_freeze_queue should > take care of flushing them all out. Hmm, OK. (1) loop_queue_rq() calls blk_mq_start_request() and then calls loop_queue_work(). (2) loop_queue_work() allocates "struct work_struct" and calls queue_work(). (3) loop_handle_cmd() from loop_process_work() from loop_workfn() is called by a WQ thread. (4) do_req_filebacked() from loop_handle_cmd() performs read/write on lo->lo_backing_file. (5) Either completion function or loop_handle_cmd() calls blk_mq_complete_request(). Therefore, as long as blk_mq_freeze_queue(lo->lo_queue) waits for completion of (5) and blocks new events for (2), there should be no work to process by loop_process_work(). Then, we can defer destroy_workqueue(lo->workqueue); spin_lock_irq(&lo->lo_work_lock); list_for_each_entry_safe(worker, pos, &lo->idle_worker_list, idle_list) { list_del(&worker->idle_list); rb_erase(&worker->rb_node, &lo->worker_tree); css_put(worker->blkcg_css); kfree(worker); } spin_unlock_irq(&lo->lo_work_lock); del_timer_sync(&lo->timer); block in __loop_clr_fd() till loop_remove() if we want. Assuming that loop devices are likely created only when there is no free one, a loop device is likely reused once created. Then, we don't need to care idle workers on every loop_configure()/__loop_clr_fd() pairs? By the way, is it safe to use single global WQ if (4) is a synchronous I/O request? Since there can be up to 1048576 loop devices, and one loop device can use another loop device as lo->lo_backing_file (unless loop_validate_file() finds a circular usage), one synchronous I/O request in (4) might recursively involve up to 1048576 works (which would be too many concurrency to be handled by a WQ) ? Also, is blk_mq_start_request(rq); if (lo->lo_state != Lo_bound) return BLK_STS_IOERR; in loop_queue_rq() correct? (Not only lo->lo_state test is racy, but wants blk_mq_end_request() like lo_complete_rq() does?