Hi Sreenkanth, Thanks for reviewing. On 02/12/2018 04:28 AM, Sreekanth Reddy wrote:
Instead of returning the scmd with DID_NO_CONNECT in TM path, can we wait for some time (10 seconds) in shutdown/unload path for the outstanding commands to complete and even then [if] the scmds are outstanding then return all the outstanding scmds with DID_NO_CONNECT in the shutdown/unload path itself as shown below,
Apparently there is a problem with that. That is not enough for TM commands; it's only for SCSI IO commands. The TM commands use the high-priority queue, and SCSI IO commands use the SCSI IO queue. But this patch only waits for and return commands in the latter: > + mpt3sas_wait_for_commands_to_complete(ioc); > + _scsih_flush_running_cmds(ioc); They only loop up until 'ioc->scsiio_depth', but TM commands get SMIDs above that [see mpt3sas_base_free_smid() and mpt3sas_scsih_issue_tm()]. So, there's an exposure for abort/reset/other TM commands. For example, the SCSI IO commands finish quickly in wait_for_commands_to_complete(), then the shutdown function continues, disables interrupts, and release pointers/memory. Then, an outstanding abort/reset command is finished, its completion interrupt is not serviced, and the SCSI EH for it kicks in/continues, and so it escalates up, and hit that oops in host reset. Is there any problem with the patch that I proposed? It seems okay to have another check in error path (not hot/performance path), and that check is already used in many other points in the code, for the same reasons (exit early before the code attempts to use stuff that might be released). Thanks again, -- Mauricio Faria de Oliveira IBM Linux Technology Center