On Mon, Oct 08, 2018 at 12:14:40PM +0530, Suganath Prabu Subramani wrote: > On Tue, Oct 2, 2018 at 7:34 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote: > > > I think the names "pci_device_is_present()" and > > > "mpt3sas_base_pci_device_is_available()" contribute to the > > > problem because they make promises that can't be kept -- all we > > > can say is that the device *was* present, but we don't know > > > whether it is *still* present. > > In the patch we are using '!' (i.e. not operation) of > pci_device_is_present(), which is logically same as pci_device_is > absent, and it is same for mpt3sas_base_pci_device_is_available(). > > My understanding is that, you want us to rename these functions for > better readability. Is that correct ? I think "pci_device_is_present()" is a poor name, but I'm not suggesting you fix it in this patch. Changing that would be a PCI core change that would also touch the tg3, igb, nvme, and now mpt3sas drivers that use it. My personal opinion is that you should do something like the patch below. The main point is that you should for the device being unreachable *at the places where you might learn that*. If you WRITE to a device that has been removed, the write will mostly likely get dropped and you won't learn anything. But when you READ from a device that has been removed, you'll most likely get ~0 data back, and you can check for that. Of course, it's also possible that pci_device_is_present() can tell you something. So you *could* sprinkle those all over, as in your patch. But you end up with code like this: if (!pci_device_is_present()) return; writel(); readl(); Here, the device might be removed right after pci_device_is_present() returns true. You do the writel() and the readl() anyway, so you really haven't gained anything. And if the readl() returned ~0, you assume it's valid data when it's not. It's better to do this: writel(); val = readl(); if (val == ~0) { /* device is unreachable, clean things up */ ... } (Obviously it's possible that ~0 is a valid value for whatever you read from the device. That depends on the device and you have to use your knowledge of the hardware. If you read ~0 from a register where that might be a valid value, you can read from a different register for with ~0 is *not* a valid value.) I don't claim the patch below is complete because I don't know anything about your hardware and what sort of registers or ring buffers it uses. You still have to arrange to flush or complete whatever is outstanding when you learn the device is gone. Bjorn diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 59d7844ee022..37b09362b31a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -6145,6 +6145,11 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc) drsprintk(ioc, pr_info(MPT3SAS_FMT "wrote magic sequence: count(%d), host_diagnostic(0x%08x)\n", ioc->name, count, host_diagnostic)); + if (host_diagnostic == ~0) { + ioc->remove_host = 1; + pr_err(MPT3SAS_FMT "HBA unreachable\n", ioc->name); + goto out; + } } while ((host_diagnostic & MPI2_DIAG_DIAG_WRITE_ENABLE) == 0); @@ -6237,6 +6242,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type) ioc_state = mpt3sas_base_get_iocstate(ioc, 0); dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: ioc_state(0x%08x)\n", ioc->name, __func__, ioc_state)); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n", + ioc->name, __func__, ioc_state); + return -EFAULT; + } /* if in RESET state, it should move to READY state shortly */ count = 0; @@ -6251,6 +6261,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type) } ssleep(1); ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n", + ioc->name, __func__, ioc_state); + return -EFAULT; + } } } @@ -6854,6 +6869,11 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc) ioc->pending_io_count = 0; ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable\n", + ioc->name, __func__); + return; + } if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL) return; @@ -6909,6 +6929,14 @@ mpt3sas_base_hard_reset_handler(struct MPT3SAS_ADAPTER *ioc, MPT3_DIAG_BUFFER_IS_RELEASED))) { is_trigger = 1; ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_err(MPT3SAS_FMT "%s: HBA unreachable\n", + ioc->name, __func__); + ioc->schedule_dead_ioc_flush_running_cmds(ioc); + r = 0; + mutex_unlock(&ioc->reset_in_progress_mutex); + goto out_unlocked; + } if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) is_fault = 1; } diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 53133cfd420f..d0d4c8d94a10 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -2653,6 +2653,11 @@ mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, u64 lun, } ioc_state = mpt3sas_base_get_iocstate(ioc, 0); + if (ioc_state == ~0) { + pr_info(MPT3SAS_FMT "%s: HBA unreachable\n", + __func__, ioc->name); + return FAILED; + } if (ioc_state & MPI2_DOORBELL_USED) { dhsprintk(ioc, pr_info(MPT3SAS_FMT "unexpected doorbell active!\n", ioc->name));