On Wed, Sep 5, 2018 at 1:08 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Wed, Sep 05, 2018 at 11:45:45AM +0530, Sreekanth Reddy wrote: >> On Tue, Sep 4, 2018 at 3:12 PM, Lukas Wunner <lukas@xxxxxxxxx> wrote: >> > Many scsi drivers call pci_channel_offline() to detect inaccessibility >> > of the device due to a PCI error: >> > https://elixir.bootlin.com/linux/v4.19-rc2/ident/pci_channel_offline >> > >> > A patch is pending such that surprise removal can also be queried >> > with that same function: >> > https://www.spinics.net/lists/linux-pci/msg75722.html >> >> Lukas, thanks for pointing to this pci_dev_is_disconnected() API. So >> we can use this API directly instead of reading the vendor Id and >> checking for all one's once this patch get accepted? > > Yes, except pci_dev_is_disconnected() is private to the PCI core, > but dev->error_state and pci_channel_offline() is public. As Busch suggested, we will use pci_device_is_present() API. We have tested HBA hot-unplug operations with this API and the things are working fine. > > >> I have one more instance where still we need this poll kthread, i.e >> during the device probe time we have some commands which has more >> timeout value (e.g. 300 seconds), so if user has unplugged this device >> just after sending this more time-out valued command then driver has >> to wait until this time-out value expires. i.e. this device is still >> visible in lspci output until this 300 seconds timeout value expires >> even though device is unplugged. if we have a poll kthread (which will >> poll for every one second) then driver can early detect the unplugged >> state and can terminate the outstanding commands and hence probe >> operation can be completed quickly. > > The only instances I can see in your driver where it waits for 300 s > is in _base_diag_reset(), which does an msleep(256) in a loop for up > to 300 s, and scsih_scan_finished(), which is called in a loop with an > msleep(10) by do_scsi_scan_host(). > > Any harm in simply checking for removal of the device in those loops > and bailing out if so? Instead of the poll kthread to achieve the same? we can do for this port enable request but still we have other requests where we don't have this type of loops and used wait_for_completion_timeout () API where we can't bailout the request in-between and we have to wait until the timeout value expires. For these types of request terminating it though watchdog thread will be simple. [PATCH 2/7] mpt3sas: Add HBA hot plug watchdog thread: https://www.spinics.net/lists/linux-scsi/msg122963.html In the above patch and in the worker thread function we are just setting ioc->remove_host flag, Apart from resetting this flag we have to terminate all the outstanding commands at the HBA level and that is missing in the above patch. we will post new set of patches. We feel that this watch dog thread is needed, as we need to gracefully terminate all the outstanding command at the HBA level apart from stop receiving the new commands. With watch dog thread we can easily terminate all the outstanding commands. > > >> Also whether we need to wait for below patches get accepted? so that >> we can post the new set of patches accommodating according to your >> suggestions, >> https://www.spinics.net/lists/linux-pci/msg75722.html >> https://www.spinics.net/lists/linux-pci/msg75438.html > > I can't tell you whether and when those patches get accepted, that's > for Bjorn Helgaas to decide. Also, what does get accepted might differ. > E.g. the first patch uses the existing pci_channel_io_perm_failure > state for removal. There was debate whether to introduce a new state > for removed devices to avoid overloading the existing state, which is > used for error recovery. > > All I can recommend is to follow linux-pci, test the patches that have > already been brought forward to ascertain they fulfil your needs, > and generally participate in the debate so that your use cases are > covered. > > Thanks, > > Lukas