On Wed, Sep 02, 2020 at 02:42:06PM -0400, Andrey Grodzovsky wrote: > Cache the PCI state on boot and before each case where we might > loose it. s/loose/lose/ > v2: Add pci_restore_state while caching the PCI state to avoid > breaking PCI core logic for stuff like suspend/resume. > > v3: Extract pci_restore_state from amdgpu_device_cache_pci_state > to avoid superflous restores during GPU resets and suspend/resumes. > > v4: Style fixes. Is the DRM convention to keep the v2/v3/v4 stuff in the commit log? I keep those below the "---" or manually remove them for PCI, but use the local convention, of course. > + /* Have stored pci confspace at hand for restore in sudden PCI error */ I assume that at least from the perspective of this code, all PCI errors are "sudden". Or if they're not, I'm curious about which would be sudden and which would not. > + if (amdgpu_device_cache_pci_state(adev->pdev)) > + pci_restore_state(pdev); > +bool amdgpu_device_cache_pci_state(struct pci_dev *pdev) > +{ > + struct drm_device *dev = pci_get_drvdata(pdev); > + struct amdgpu_device *adev = drm_to_adev(dev); > + int r; > + > + r = pci_save_state(pdev); > + if (!r) { > + kfree(adev->pci_state); > + > + adev->pci_state = pci_store_saved_state(pdev); > + > + if (!adev->pci_state) { > + DRM_ERROR("Failed to store PCI saved state"); > + return false; > + } > + } else { > + DRM_WARN("Failed to save PCI state, err:%d\n", r); > + return false; > + } > + > + return true; > +} > + > +bool amdgpu_device_load_pci_state(struct pci_dev *pdev) > +{ > + struct drm_device *dev = pci_get_drvdata(pdev); > + struct amdgpu_device *adev = drm_to_adev(dev); > + int r; > + > + if (!adev->pci_state) > + return false; > + > + r = pci_load_saved_state(pdev, adev->pci_state); I'm a little bit hesitant to pci_load_saved_state() and pci_store_saved_state() being used here, simply because they're currently only used by VFIO, Xen, and nvme. So I don't have a real objection, but just pointing out that apparently you're doing something really special that isn't commonly used and tested, so it's more likely to be broken or incomplete. There's lots of state that the PCI core *can't* save/restore, and pci_load_saved_state() doesn't even handle all the architected PCI state, i.e., we only call pci_add_cap_save_buffer() or pci_add_ext_cap_save_buffer() for a few of the capabilities.