On 4/16/21 4:38 PM, Mike Christie wrote: > On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: >> Currently TMF commands are removed from de_device.dev_tmf_list at >> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd >> up on a command status (response) is queued in transport layer. >> It means that LUN and backend device can be deleted meantime and at >> the moment of repsonse completion a panic is occured: >> >> target_tmr_work() >> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire >> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun >> - // - // - // - >> <<<--- lun remove >> <<<--- core backend device remove >> - // - // - // - >> qlt_handle_abts_completion() >> tfo->free_mcmd() >> transport_generic_free_cmd() >> target_put_sess_cmd() >> core_tmr_release_req() { >> if (dev) { // backend device, can not be null >> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH >> > > Sorry about this. I was thinking about this patch some more while > reviewing this version. > > Is there another race possible? > > 1. We have a cmd running in lio. > 2. The initiator times it out and sends tmf. > 3. cmd completes. > 4. target_submit_tmr has called transport_lookup_tmr_lun. > 5. transport_lookup_tmr_lun has set se_dev and tmr_dev. > 6. You now remove the lun and device. > 7. Now you crash on the dev references. > > I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun > like is done in transport_lookup_cmd_lun. If successful set > > se_cmd->lun_ref_active = true; > > If we get the ref, we drop it in transport_lun_remove_cmd like with > the non tmr case. > Ignore this. I see we do a percpu_ref_tryget_live in there already.