Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That was the same bug. What we need to do is to prevent VMID invalidation while the SDMA is using it.

The first part was to disallow concurrent VMID flushes, the second part was doing VMID0 flushes through the SDMA block itself.

Both workarounds where added to avoid corruption, that GFXOFF is hanging without this is certainly a completely different issue.

I suspect that we have similar issue as with Vega/Raven where we need to grab a semaphore to avoid the block from being gated while an invalidation is in progress.

Regards,
Christian.


Am 26.09.2019 20:07 schrieb "Deucher, Alexander" <Alexander.Deucher@xxxxxxx>:

Maybe I'm mixing up issues.  The navi10/14 issue that was fixed on navi12 was fixed in amdgpu_ids.c in

commit a2bd77bbde791202267c25478bbcbe71bb4ecdd5
Author: Christian König <christian.koenig@xxxxxxx>
Date:   Thu Feb 7 12:10:29 2019 +0100

    drm/amdgpu: disable concurrent flushes for Navi10 v2
   
    Navi10 have a bug in the SDMA which can theoretically cause memory
    corruption with concurrent VMID flushes
   
    v2: explicitely check Navi10
   
    Signed-off-by: Christian König <christian.koenig@xxxxxxx>
    Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>

This is a different issue and may apply to all navi parts.  so maybe the patch is fine as is.

Alex


From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, September 26, 2019 2:02 PM
To: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
Cc: Alex Deucher <alexdeucher@xxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
 
Well then you didn't figured out the root cause correctly.

This is to avoid data corruption with the SDMA on Navi 10/14 and should definitely not applied to Navi 12.

The hardware team went through quite some work to avoid this.

Christian.

Am 26.09.2019 19:38 schrieb "Yuan, Xiaojie" <Xiaojie.Yuan@xxxxxxx>:
 Hi Alex / Christian,

When gfxoff is enabled for Navi12, I observed sdma0 hang while launching desktop. When this workaround is applied, the issue fades away.
That's why I included this workaround for Navi12 as well.

BR,
Xiaojie

From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, September 26, 2019 10:20 PM
To: Alex Deucher <alexdeucher@xxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
 


Am 26.09.2019 15:51 schrieb Alex Deucher <alexdeucher@xxxxxxxxx>:
On Thu, Sep 26, 2019 at 9:47 AM Koenig, Christian
<Christian.Koenig@xxxxxxx> wrote:
>
> Am 26.09.19 um 15:40 schrieb Alex Deucher:
> > On Thu, Sep 26, 2019 at 8:29 AM Christian König
> > <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
> >> Stop, wait a second guys!
> >>
> >> This will disable the workaround for Navi10 and that is certainly not correct:
> >>
> >> !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12)
> >>
> > Actually, I think it's correct. When I merged the baco patch, I
> > accidentally dropped the navi checks.  E.g.,
> > @@ -245,8 +245,9 @@ static void gmc_v10_0_flush_gpu_tlb(struct
> > amdgpu_device *adev,
> >          mutex_lock(&adev->mman.gtt_window_lock);
> >
> >          gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_MMHUB, 0);
> > -       if (!adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready ||
> > -           adev->asic_type != CHIP_NAVI10) {
> > +       if (!adev->mman.buffer_funcs_enabled ||
> > +           !adev->ib_pool_ready ||
> > +           adev->in_gpu_reset) {
> >                  gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB, 0);
> >                  mutex_unlock(&adev->mman.gtt_window_lock);
> >                  return;
> > I think it should have been
> > adev->asic_type != CHIP_NAVI10 && adev->asic_type != CHIP_NAVI14 &&
> > adev->asic_type != CHIP_NAVI12
>
> My last status is that Navi12 is not supposed to need that workaround,
> that's why we checked Navi10 and later Navi14 separately.
>
> It's possible that I miss-read the !(adev->asic_type >= CHIP_NAVI10 &&
> adev->asic_type <= CHIP_NAVI12) check here, but either way that looks to
> complicated to me.
>
> We should rather mention every affected asic type separately here.

Good point.  navi12 should be dropped from the check.  How about the following?

I would rather test explicitly for Navi 10 and 14, cause we can't be sure if there won't be another variant in the future.

Christian.


diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 241a4e57cf4a..280bbd7ca8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -292,7 +292,8 @@ static void gmc_v10_0_flush_gpu_tlb(struct
amdgpu_device *adev, uint32_t vmid,

        if (!adev->mman.buffer_funcs_enabled ||
            !adev->ib_pool_ready ||
-           adev->in_gpu_reset) {
+           adev->in_gpu_reset ||
+           (adev->asic_type == CHIP_NAVI12)) {
                gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
                mutex_unlock(&adev->mman.gtt_window_lock);
                return;

Alex

>
> Regards,
> Christian.
>
> >
> > Alex
> >
> >> Christian.
> >>
> >>
> >> Am 26.09.19 um 14:26 schrieb Deucher, Alexander:
> >>
> >> Please add a patch description.  With that fixed:
> >> Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
> >> ________________________________
> >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
> >> Sent: Thursday, September 26, 2019 4:09 AM
> >> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
> >> Cc: alexdeucher@xxxxxxxxx <alexdeucher@xxxxxxxxx>
> >> Subject: Re: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
> >>
> >> Hi Alex,
> >>
> >> This patch is to add the asic_type check which is missing after drm-next branch rebase.
> >>
> >> BR,
> >> Xiaojie
> >> ________________________________
> >> From: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
> >> Sent: Thursday, September 26, 2019 4:08 PM
> >> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
> >> Cc: alexdeucher@xxxxxxxxx <alexdeucher@xxxxxxxxx>; Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
> >> Subject: [PATCH] drm/amdgpu/gmc10: apply the 'invalidation from sdma' workaround for navi
> >>
> >> Fixes: 767acabdac81 ("drm/amd/powerplay: add baco smu reset function for smu11")
> >> Signed-off-by: Xiaojie Yuan <xiaojie.yuan@xxxxxxx>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> index cb3f61873baa..dc2e68e019eb 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >> @@ -292,6 +292,7 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
> >>
> >>           if (!adev->mman.buffer_funcs_enabled ||
> >>               !adev->ib_pool_ready ||
> >> +           !(adev->asic_type >= CHIP_NAVI10 && adev->asic_type <= CHIP_NAVI12) ||
> >>               adev->in_gpu_reset) {
> >>                   gmc_v10_0_flush_vm_hub(adev, vmid, AMDGPU_GFXHUB_0, 0);
> >>                   mutex_unlock(&adev->mman.gtt_window_lock);
> >> --
> >> 2.20.1
> >>
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>
> >>
>


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux