On 4/16/21 4:21 AM, Dmitry Bogdanov wrote: > Currently TMF commands are removed from de_device.dev_tmf_list at > the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd > up on a command status (response) is queued in transport layer. > It means that LUN and backend device can be deleted meantime and at > the moment of repsonse completion a panic is occured: > > target_tmr_work() > cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire > transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun > - // - // - // - > <<<--- lun remove > <<<--- core backend device remove > - // - // - // - > qlt_handle_abts_completion() > tfo->free_mcmd() > transport_generic_free_cmd() > target_put_sess_cmd() > core_tmr_release_req() { > if (dev) { // backend device, can not be null > spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH > Sorry about this. I was thinking about this patch some more while reviewing this version. Is there another race possible? 1. We have a cmd running in lio. 2. The initiator times it out and sends tmf. 3. cmd completes. 4. target_submit_tmr has called transport_lookup_tmr_lun. 5. transport_lookup_tmr_lun has set se_dev and tmr_dev. 6. You now remove the lun and device. 7. Now you crash on the dev references. I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun like is done in transport_lookup_cmd_lun. If successful set se_cmd->lun_ref_active = true; If we get the ref, we drop it in transport_lun_remove_cmd like with the non tmr case. Combined with this patch then I think we are ok for all races.