Bart Van Assche <bvanassche@xxxxxxx> writes: > On 2020-04-01 17:00, Luis Chamberlain wrote: >> Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") moved >> the blk_release_queue() into a workqueue after a splat floated around >> with some work here which could sleep in blk_exit_rl(). >> >> On recent commit db6d9952356 ("block: remove request_list code") though >> Jens Axboe removed this code, now merged since v5.0. We no longer have >> to defer this work. >> >> By doing this we also avoid failing to detach / attach a block >> device with a BLKTRACESETUP. This issue can be reproduced with >> break-blktrace [0] using: >> >> break-blktrace -c 10 -d -s >> >> The kernel does not crash without this commit, it just fails to >> create the block device because the prior block device removal >> deferred work is pending. After this commit we can use the above >> flaky use of blktrace without an issue. >> >> [0] https://github.com/mcgrof/break-blktrace >> >> Cc: Bart Van Assche <bvanassche@xxxxxxx> >> Cc: Omar Sandoval <osandov@xxxxxx> >> Cc: Hannes Reinecke <hare@xxxxxxxx> >> Cc: Nicolai Stange <nstange@xxxxxxx> >> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxxxxx> >> Suggested-by: Nicolai Stange <nstange@xxxxxxx> >> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> >> --- >> block/blk-sysfs.c | 18 +++++------------- >> 1 file changed, 5 insertions(+), 13 deletions(-) >> >> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> index 20f20b0fa0b9..f159b40899ee 100644 >> --- a/block/blk-sysfs.c >> +++ b/block/blk-sysfs.c >> @@ -862,8 +862,8 @@ static void blk_exit_queue(struct request_queue *q) >> >> >> /** >> - * __blk_release_queue - release a request queue >> - * @work: pointer to the release_work member of the request queue to be released >> + * blk_release_queue - release a request queue >> + * @kojb: pointer to the kobj representing the request queue >> * >> * Description: >> * This function is called when a block device is being unregistered. The >> @@ -873,9 +873,10 @@ static void blk_exit_queue(struct request_queue *q) >> * of the request queue reaches zero, blk_release_queue is called to release >> * all allocated resources of the request queue. >> */ >> -static void __blk_release_queue(struct work_struct *work) >> +static void blk_release_queue(struct kobject *kobj) >> { >> - struct request_queue *q = container_of(work, typeof(*q), release_work); >> + struct request_queue *q = >> + container_of(kobj, struct request_queue, kobj); >> >> if (test_bit(QUEUE_FLAG_POLL_STATS, &q->queue_flags)) >> blk_stat_remove_callback(q, q->poll_cb); >> @@ -905,15 +906,6 @@ static void __blk_release_queue(struct work_struct *work) >> call_rcu(&q->rcu_head, blk_free_queue_rcu); >> } >> >> -static void blk_release_queue(struct kobject *kobj) >> -{ >> - struct request_queue *q = >> - container_of(kobj, struct request_queue, kobj); >> - >> - INIT_WORK(&q->release_work, __blk_release_queue); >> - schedule_work(&q->release_work); >> -} >> - >> static const struct sysfs_ops queue_sysfs_ops = { >> .show = queue_attr_show, >> .store = queue_attr_store, > > The description of this patch mentions a single blk_release_queue() call > that happened in the past from a context from which sleeping is not > allowed and from which sleeping is allowed today. Have all other > blk_release_queue() / blk_put_queue() calls been verified to see whether > none of these happens from a context from which sleeping is not allowed? I've just done this today and found the following potentially problematic call paths to blk_put_queue(). 1.) mem_cgroup_throttle_swaprate() takes a spinlock and calls blkcg_schedule_throttle()->blk_put_queue(). Also note that AFAICS mem_cgroup_try_charge_delay() can be called with GFP_ATOMIC. 2.) scsi_unblock_requests() gets called from a lot of drivers and invoke blk_put_queue() through scsi_unblock_requests() -> scsi_run_host_queues() -> scsi_starved_list_run() -> blk_put_queue(). Most call sites are fine, the ones which are not are: a.) pmcraid_complete_ioa_reset(). This gets assigned to struct pmcraid_cmd's ->cmd_done and later invoked under a spinlock. b.) qla82xx_fw_dump() and qla8044_fw_dump(). These can potentially block w/o this patch already, because both invoke qla2x00_wait_for_chip_reset(). However, they can get called from IRQ context. For example, qla82xx_intr_handler(), qla82xx_msix_default() and qla82xx_poll() call qla2x00_async_event(), which calls ->fw_dump(). The aforementioned functions can also reach ->fw_dump() through qla24xx_process_response_queue()->qlt_handle_abts_recv()->qlt_response_pkt_all_vps() ->qlt_response_pkt()->qlt_handle_abts_completion()->qlt_chk_unresolv_exchg() -> ->fw_dump(). But I'd consider this a problem with the driver -- either ->fw_dump() can sleep and must not be called from IRQ context or they must not invoke qla2x00_wait_for_hba_ready(). (I can share the full analysis, but it's lengthy and contains nothing interesting except for what is listed above). One final note though: If I'm not mistaken, then the final blk_put_queue() can in principle block even today, simply by virtue of the kernfs operations invoked through kobject_put()->kobject_release()->kobject_cleanup()->kobject_del() ->sysfs_remove_dir()->kernfs_remove()->mutex_lock()? Thanks, Nicolai -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg), GF: Felix Imendörffer