Re: [PATCH] amdgpu: Don't print L2 status if there's nothing to print

Alex Deucher <alexdeucher@xxxxxxxxx> · Mon, 21 Oct 2024 10:36:39 -0400

On Mon, Oct 21, 2024 at 10:13 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote:
>
>
>
> On 10/19/2024 1:51 AM, Kent Russell wrote:
> > If a 2nd fault comes in before the 1st is handled, the 1st fault will
> > clear out the FAULT STATUS registers before the 2nd fault is handled.
> > Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
> > information, to avoid confusion of why some VM fault status prints in
> > dmesg are all zeroes.
> >
>
> I guess this problem can be avoided if the information is read from IH
> cookie/context rather than from status register.

Is all of this available in the IH cookie?  IIRC, not all of it is.

Alex

>
> Thanks,
> Lijo
>
> > Signed-off-by: Kent Russell <kent.russell@xxxxxxx>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 5 ++++-
> >  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 5 ++++-
> >  drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 5 ++++-
> >  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 6 ++++++
> >  4 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > index 5cf2002fcba8..5fe7a1c74ff1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > @@ -175,7 +175,10 @@ static int gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
> >                       addr, entry->client_id,
> >                       soc15_ih_clientid_name[entry->client_id]);
> >
> > -     if (!amdgpu_sriov_vf(adev))
> > +     /* Only print L2 fault status if the status register could be read and
> > +      * contains useful information
> > +      */
> > +     if (status != 0)
> >               hub->vmhub_funcs->print_l2_protection_fault_status(adev,
> >                                                                  status);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> > index 4df4d73038f8..25a3dee27d81 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> > @@ -144,7 +144,10 @@ static int gmc_v11_0_process_interrupt(struct amdgpu_device *adev,
> >               dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
> >                               addr, entry->client_id);
> >
> > -             if (!amdgpu_sriov_vf(adev))
> > +             /* Only print L2 fault status if the status register could be read and
> > +              * contains useful information
> > +              */
> > +             if (status != 0)
> >                       hub->vmhub_funcs->print_l2_protection_fault_status(adev, status);
> >       }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > index e33f9e9058cc..3dee7474c06d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > @@ -137,7 +137,10 @@ static int gmc_v12_0_process_interrupt(struct amdgpu_device *adev,
> >               dev_err(adev->dev, "  in page starting at address 0x%016llx from client %d\n",
> >                               addr, entry->client_id);
> >
> > -             if (!amdgpu_sriov_vf(adev))
> > +             /* Only print L2 fault status if the status register could be read and
> > +              * contains useful information
> > +              */
> > +             if (status != 0)
> >                       hub->vmhub_funcs->print_l2_protection_fault_status(adev, status);
> >       }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > index 010db0e58650..f43ded8a0aab 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> > @@ -672,6 +672,12 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
> >           (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2)))
> >               return 0;
> >
> > +     /* Only print L2 fault status if the status register could be read and
> > +      * contains useful information
> > +      */
> > +     if (!status)
> > +             return 0;
> > +
> >       if (!amdgpu_sriov_vf(adev))
> >               WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
> >