[Public] Hi, Lijo > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Thursday, July 27, 2023 6:49 PM > To: Liang, Prike <Prike.Liang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: only save and restore GPU device config > at GPU error > > > > On 7/27/2023 3:20 PM, Prike Liang wrote: > > There's need a check on the GPU error state before save and restore > > GPU device config space. > > > > Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 358dcc1070c5..5ef3c5c49bee 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -3946,7 +3946,8 @@ int amdgpu_device_init(struct amdgpu_device > *adev, > > dev_err(adev->dev, "amdgpu_pmu_init failed\n"); > > > > /* Have stored pci confspace at hand for restore in sudden PCI error > */ > > - if (amdgpu_device_cache_pci_state(adev->pdev)) > > + if (adev->pdev->error_state != pci_channel_io_normal && > > + amdgpu_device_cache_pci_state(adev->pdev)) > > We need the clean state to be cached, not the state when there is an error. > This state is later used to restore later, say when a mode-2 reset happens. > > Thanks, > Lijo > But the original code in the amdgpu_device_init () is trying to restore the GPU config space immediately after the GPU config space saved, so that looks this GPU save state restored unconditionally and that's not only for the GPU error case? Meanwhile, as to the mode-2 or some other reset case the reset function will do the save and restore GPU state separately and not use the GPU state saved in amdgpu_device_init() process. In the amdgpu_device_init(), how about restore the GPU state only for the GPU error case? > > pci_restore_state(pdev); > > > > /* if we have > 1 VGA cards, then disable the amdgpu VGA resources > > */