On Sat, Apr 08, 2023 at 01:59:21PM -0700, Jerry Snitselaar wrote: > On Sat, Apr 08, 2023 at 12:18:29PM -0700, Jerry Snitselaar wrote: > > We've had some people trying to track a problem for months revolving > > around a system hanging at shutdown, and last thing they see being a > > message from mpt3sas about a reset. They quickly bisected down to the > > commit below, and reverted it made the problem go away for the > > customer. > > > > b424eaa1b51c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown") > > > > That should be (grabbed the wrong commit id): > > fae21608c31c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown") > > > I got asked to look at something since I recently at another issue > > that involved mpt3sas at shutdown, so I was looking through the > > history, saw this commit being mentined. Looking at it, I'm not sure > > why it is doing what is doing. > > > > It says it is to perform a soft reset, but that was already happening before this commit via: > > > > scsih_shutdown -> mpt3sas_base_detach -> mpt3sas_base_free_resources -> _base_make_ioc_ready(ioc, SOFT_RESET); > > > > The original submission [1] had the following commit message: > > > > "During shutdown just move the IOC state to Ready state > > by issuing MUR. No need to free any IOC memory pools." > > > > But is now skipping more than not freeing the memory pools. It no > > longer frees memory that was kalloc'd, it doesn't unmap something that > > was iomapped, it no longer cleans up the fault reset workqueue, and no > > longer calls the pci cleanup code. It also no longer does the things > > it moved to scsih_shutdown under the pci access mutex, nor uses the if > > condition that was in mpt3sas_base_free_resources. > > > > [1] https://lore.kernel.org/r/20210705145951.32258-1-sreekanth.reddy@xxxxxxxxxxxx > > > > > > Am I missing something, and what the commit does here is really okay? > > > > > > Regards, > > Jerry > > > One last thing. The issue I was looking at a few weeks ago turned out to be that mpt3sas frees and unmaps the trace buffer in mpt3sas_ctl_exit(), but doesn't appear to tell the fw about it so it keeps trying to write to the trace buffer while the soft reset happens. I don't know if should be doing something like ctl_diag_unregister(), or something else? I thought I saw an email from Tomas, but couldn't find it so I thought I'd bring that to your attention. Regards, Jerry