On 6/30/22 4:37 PM, Bart Van Assche wrote: > struct scsi_device *sdev; > @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost) > restart: > spin_lock_irq(shost->host_lock); > list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > + if (sdev->sdev_state == SDEV_DEL && > + blk_queue_dead(sdev->request_queue)) { > continue; > + } > + if (sdev->sdev_state == SDEV_DEL) { > + get_device(&sdev->sdev_gendev); > + spin_unlock_irq(shost->host_lock); > + > + while (!blk_queue_dead(sdev->request_queue)) > + msleep(10); > + > + spin_lock_irq(shost->host_lock); > + put_device(&sdev->sdev_gendev); > + goto restart; > + } > spin_unlock_irq(shost->host_lock); > __scsi_remove_device(sdev); > goto restart; Are there 2 ways to hit your issue? 1. Normal case. srp_remove_one frees srp_device. Then all refs to host are dropped and we call srp_exit_cmd_priv which accesses the freed srp_device? You don't need the above patch for this case right. 2. Are you hitting a race? Something did a scsi_remove_device. It set the device state to SDEV_DEL. It's stuck in blk_cleanup_queue blk_freeze_queue. Now we do the above code. Without your patch we just move on by that device. Commands could still be accessed. With your patch we wait for for that other thread to complete the device destruction. Could you also solve both issues by adding a scsi_host_template scsi_host release function that's called from scsi_host_dev_release. A new srp_host_release would free structs like the srp device from there. Or would we still have an issue for #2 where we just don't know how far blk_freeze_queue has got so commands could still be hitting our queue_rq function when we are doing scsi_host_dev_release? I like how your patch makes it so we know after scsi_remove_host has returned that the device is really gone. Could we have a similar race as in #2 with someone doing a scsi_remove_device and transport doing scsi_remove_target? We would not hit the same use after free from the tag set exit_cmd_priv being called. But, for example, if we wanted to free some transport level resources that running commands reference after scsi_target_remove could we hit a use after free? If so maybe we want to move this wait/check to __scsi_remove_device?