On Wed, 2020-03-11 at 17:19 +0530, Sreekanth Reddy wrote: > On Wed, Mar 11, 2020 at 4:55 PM Sreekanth Reddy > <sreekanth.reddy@xxxxxxxxxxxx> wrote: > > > > On Wed, Mar 11, 2020 at 4:35 PM Amit Shah <amit@xxxxxxxxxx> wrote: > > > > > > On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote: > > > > Generic protection fault type kernel panic is observed when > > > > user > > > > performs soft(ordered) HBA unplug operation while IOs are > > > > running > > > > on drives connected to HBA. > > > > > > > > When user performs ordered HBA removal operation then kernel > > > > calls > > > > PCI device's .remove() call back function where driver is > > > > flushing > > > > out > > > > all the outstanding SCSI IO commands with DID_NO_CONNECT host > > > > byte > > > > and > > > > also un-maps sg buffers allocated for these IO commands. > > > > But in the ordered HBA removal case (unlike of real HBA hot > > > > unplug) > > > > HBA device is still alive and hence HBA hardware is performing > > > > the > > > > DMA operations to those buffers on the system memory which are > > > > already > > > > unmapped while flushing out the outstanding SCSI IO commands > > > > and this leads to Kernel panic. > > > > > > > > Fix: > > > > Don't flush out the outstanding IOs from .remove() path in case > > > > of > > > > ordered HBA removal since HBA will be still alive in this case > > > > and > > > > it can complete the outstanding IOs. Flush out the outstanding > > > > IOs > > > > only in case physical HBA hot unplug where their won't be any > > > > communication with the HBA. > > > > > > Can you please point to the commit that introduces the bug? > > > > Sure I will add the commit ID which introduced this bug in the next > > patch. Thanks. > > > > > > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > Signed-off-by: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxx> > > > > --- > > > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > index 778d5e6..04a40af 100644 > > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev > > > > *pdev) > > > > > > > > ioc->remove_host = 1; > > > > > > > > - mpt3sas_wait_for_commands_to_complete(ioc); > > > > - _scsih_flush_running_cmds(ioc); > > > > + if (!pci_device_is_present(pdev)) > > > > + _scsih_flush_running_cmds(ioc); > > > > > > > > _scsih_fw_event_cleanup_queue(ioc); > > > > > > > > @@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev > > > > *pdev) > > > > > > Just a note: this function is scsih_shutdown(). Doesn't block > > > application of the patch, though. Just wondering how the patch > > > was > > > created. > > I got your query now, yes this hunk change is in scsih_shutdown() > function. I am not sure why scsih_remove name is getting displayed > here in this hunk. I have used 'git format-patch' to generate the > patch. Thanks. Does the commit description need an update as well? It only talks about remove callback. > > > > > Sorry I didn't get you. Can you please elaborate your query? > > > > > > > > > > > > > ioc->remove_host = 1; > > > > > > > > - mpt3sas_wait_for_commands_to_complete(ioc); > > > > - _scsih_flush_running_cmds(ioc); > > > > + if (!pci_device_is_present(pdev)) > > > > + _scsih_flush_running_cmds(ioc); > > > > > > > > _scsih_fw_event_cleanup_queue(ioc); > > > >