On 11/30/2016 09:46 AM, Michael S. Tsirkin wrote: > On Mon, Nov 28, 2016 at 05:32:15PM +0800, Cao jin wrote: >> >> >>> >>>> + if (severity == AER_FATAL && strcmp(dev->driver->name, "vfio-pci")) { >>> >>> You really want some flag in the device, or something similar. >>> Also, how do we know driver is not going away at this point? >>> >> >> I didn't think of this condition, and I don't quite follow how would driver >> go away?(device has error happened, then is removed?) > > Yes - hotplug request detected. Does something prevent this? > I wasn't realized there would be possible to have hotplug happened during error recovery. But, it seems is the aer driver's thing to consider it? >>>> result = reset_link(dev); >>>> if (result != PCI_ERS_RESULT_RECOVERED) >>>> goto failed; >> >>>> @@ -1187,10 +1200,30 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>>> return PCI_ERS_RESULT_DISCONNECT; >>>> } >>>> >>>> + /* get device's uncorrectable error status as soon as possible, >>>> + * and signal it to user space. The later we read it, the possibility >>>> + * the register value is mangled grows. */ >>>> + aer_cap_offset = pci_find_ext_capability(vdev->pdev, PCI_EXT_CAP_ID_ERR); >>>> + ret = pci_read_config_dword(vdev->pdev, aer_cap_offset + >>>> + PCI_ERR_UNCOR_STATUS, &uncor_status); >>>> + if (ret) >>>> + return PCI_ERS_RESULT_DISCONNECT; >>>> + >>>> + pr_err("device %d got AER detect notification. uncorrectable error status = 0x%x\n", pdev->devfn, uncor_status);//to be removed >>>> mutex_lock(&vdev->igate); >>>> + >>>> + vdev->aer_recovering = true; >>>> + reinit_completion(&vdev->aer_error_completion); >>>> + >>>> + /* suspend config space access from user space, >>>> + * when vfio-pci's error recovery process is on */ >>> >>> what about access to memory etc? Do you need to suspend this as well? >>> >> >> Yes, this question came into my mind a little bit, but I didn't see some >> existing APIs like pci_cfg_access_xxx which can help to do this.(I am still >> not familiar with kernel) > > This isn't easy to do at all. > I guess we can use completion in vfio_pci_rw to block all kinds of access during vfio's error recovery. But from another perspective, now that we have disabled reset_link in host, I think the access to device could be performed normally. To me, the benefit of using pci_cfg_access is: we won't bother to do "wait 3s to do guest's link reset" > >>>> + pci_cfg_access_trylock(vdev->pdev); >>> >>> If you trylock, you need to handle failure. >> >> try lock returns 0 if access is already locked, 1 otherwise. Is it necessary >> to check its return value? > > Locked by whom? You blissfully access as if it's locked by you. > >> >> -- >> Sincerely, >> Cao jin >> > > > . > -- Sincerely, Cao jin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html