On Mon, Oct 21, 2024 at 10:13 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote: > > > > On 10/19/2024 1:51 AM, Kent Russell wrote: > > If a 2nd fault comes in before the 1st is handled, the 1st fault will > > clear out the FAULT STATUS registers before the 2nd fault is handled. > > Thus we get a lot of zeroes. If status=0, just skip the L2 fault status > > information, to avoid confusion of why some VM fault status prints in > > dmesg are all zeroes. > > > > I guess this problem can be avoided if the information is read from IH > cookie/context rather than from status register. Is all of this available in the IH cookie? IIRC, not all of it is. Alex > > Thanks, > Lijo > > > Signed-off-by: Kent Russell <kent.russell@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 5 ++++- > > drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 ++++- > > drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 5 ++++- > > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 6 ++++++ > > 4 files changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > index 5cf2002fcba8..5fe7a1c74ff1 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > @@ -175,7 +175,10 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev, > > addr, entry->client_id, > > soc15_ih_clientid_name[entry->client_id]); > > > > - if (!amdgpu_sriov_vf(adev)) > > + /* Only print L2 fault status if the status register could be read and > > + * contains useful information > > + */ > > + if (status != 0) > > hub->vmhub_funcs->print_l2_protection_fault_status(adev, > > status); > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > > index 4df4d73038f8..25a3dee27d81 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c > > @@ -144,7 +144,10 @@ static int gmc_v11_0_process_interrupt(struct amdgpu_device *adev, > > dev_err(adev->dev, " in page starting at address 0x%016llx from client %d\n", > > addr, entry->client_id); > > > > - if (!amdgpu_sriov_vf(adev)) > > + /* Only print L2 fault status if the status register could be read and > > + * contains useful information > > + */ > > + if (status != 0) > > hub->vmhub_funcs->print_l2_protection_fault_status(adev, status); > > } > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > > index e33f9e9058cc..3dee7474c06d 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c > > @@ -137,7 +137,10 @@ static int gmc_v12_0_process_interrupt(struct amdgpu_device *adev, > > dev_err(adev->dev, " in page starting at address 0x%016llx from client %d\n", > > addr, entry->client_id); > > > > - if (!amdgpu_sriov_vf(adev)) > > + /* Only print L2 fault status if the status register could be read and > > + * contains useful information > > + */ > > + if (status != 0) > > hub->vmhub_funcs->print_l2_protection_fault_status(adev, status); > > } > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > index 010db0e58650..f43ded8a0aab 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > @@ -672,6 +672,12 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev, > > (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2))) > > return 0; > > > > + /* Only print L2 fault status if the status register could be read and > > + * contains useful information > > + */ > > + if (!status) > > + return 0; > > + > > if (!amdgpu_sriov_vf(adev)) > > WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1); > >