Re: [PATCH v3 2/2] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

Alex Deucher <alexdeucher@xxxxxxxxx> · Thu, 20 Feb 2025 11:19:15 -0500

On Thu, Feb 20, 2025 at 5:08 AM Christian König
<christian.koenig@xxxxxxx> wrote:
>
> Am 20.02.25 um 06:41 schrieb Lazar, Lijo:
> > On 2/19/2025 2:35 PM, jesse.zhang@xxxxxxx wrote:
> >> From: "Jesse.zhang@xxxxxxx" <jesse.zhang@xxxxxxx>
> >>
> >> - Modify the VM invalidation engine allocation logic to handle SDMA page rings.
> >>   SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of
> >>   allocating a separate engine. This change ensures efficient resource management and
> >>   avoids the issue of insufficient VM invalidation engines.
> >>
> >> - Add synchronization for GPU TLB flush operations in gmc_v9_0.c.
> >>   Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions
> >>   during TLB flush operations. This improves the stability and reliability of the driver,
> >>   especially in multi-threaded environments.
> >>
> >> V3: replace the sdma ring check with a function `amdgpu_sdma_is_shared_inv_eng`
> >>  to Check if a ring is an SDMA ring that shares a VM invalidation engine
> >>
> >> Suggested-by: Lijo Lazar <lijo.lazar@xxxxxxx>
> >> Signed-off-by: Jesse Zhang <jesse.zhang@xxxxxxx>
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c  |  7 +++++++
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 18 ++++++++++++++++++
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h |  1 +
> >>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  2 ++
> >>  4 files changed, 28 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> >> index cb914ce82eb5..8ccc3fb34940 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> >> @@ -601,8 +601,15 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device *adev)
> >>                      return -EINVAL;
> >>              }
> >>
> >> +    if(amdgpu_sdma_is_shared_inv_eng(adev, ring)) {
> >> +            /* Do not allocate a separate VM invalidation engine for SDMA page rings.
> >> +             * Shared VM invalid engine with sdma gfx ring.
> >> +             */
> >> +            ring->vm_inv_eng = inv_eng - 1;
> > This kind of logic has an implicit assumption that SDMA IP does
> > something like
> >
> > for each inst
> >       init sdma ring
> >       init page ring
> >
> > If the IP does something like init page ring/init sdma ring or init sdma
> > ring of all instances followed by init page ring of all instances, this
> > doesn't work.
> >
> > The other thing is this is not readable. There is no clear way to know
> > what this thing is really doing. That is why it's better to explicitly
> > express what the logic is doing so that it's maintainable in future.
>
> What guarantees that the SDMA gfx ring isn't interrupted by the paging ring while doing an invalidation?
>
> In other words as far as I can see it is perfectly possible that the SDMA gfx ring grabs the semaphore, is interrupted by the SDMA paging ring and then in turn also waits for the semaphore.
>
> This would mean a deadlock inside the SDMA.
>
> As far as I can see what you try to do here is a no-go.

SDMA is single threaded.  Only one queue will execute at any given
time as such there is only one inv eng for the entire engine (assuming
the use of the inv eng is limited to one packet).

Alex

>
> Regards,
> Christian.
>
> >
> > Thanks,
> > Lijo
> >
> >> +    } else {
> >>              ring->vm_inv_eng = inv_eng - 1;
> >>              vm_inv_engs[vmhub] &= ~(1 << ring->vm_inv_eng);
> >> +    }
> >>
> >>              dev_info(adev->dev, "ring %s uses VM inv eng %u on hub %u\n",
> >>                       ring->name, ring->vm_inv_eng, ring->vm_hub);
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> >> index 8de214a8ba6d..159ebd9ee62f 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> >> @@ -503,6 +503,24 @@ void amdgpu_sdma_sysfs_reset_mask_fini(struct amdgpu_device *adev)
> >>      }
> >>  }
> >>
> >> +/**
> >> +* amdgpu_sdma_is_shared_inv_eng - Check if a ring is an SDMA ring that shares a VM invalidation engine
> >> +* @adev: Pointer to the AMDGPU device structure
> >> +* @ring: Pointer to the ring structure to check
> >> +*
> >> +* This function checks if the given ring is an SDMA ring that shares a VM invalidation engine.
> >> +* It returns true if the ring is such an SDMA ring, false otherwise.
> >> +*/
> >> +bool amdgpu_sdma_is_shared_inv_eng(struct amdgpu_device *adev, struct amdgpu_ring* ring)
> >> +{
> >> +    int i = ring->me;
> >> +
> >> +    if (!adev->sdma.has_page_queue || i >= adev->sdma.num_instances)
> >> +            return false;
> >> +
> >> +    return (ring == &adev->sdma.instance[i].ring);
> >> +}
> >> +
> >>  /**
> >>   * amdgpu_sdma_register_on_reset_callbacks - Register SDMA reset callbacks
> >>   * @funcs: Pointer to the callback structure containing pre_reset and post_reset functions
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> >> index 7effc2673466..da3ec6655be7 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
> >> @@ -194,4 +194,5 @@ int amdgpu_sdma_ras_sw_init(struct amdgpu_device *adev);
> >>  void amdgpu_debugfs_sdma_sched_mask_init(struct amdgpu_device *adev);
> >>  int amdgpu_sdma_sysfs_reset_mask_init(struct amdgpu_device *adev);
> >>  void amdgpu_sdma_sysfs_reset_mask_fini(struct amdgpu_device *adev);
> >> +bool amdgpu_sdma_is_shared_inv_eng(struct amdgpu_device *adev, struct amdgpu_ring* ring);
> >>  #endif
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> >> index 2aa87fdf715f..2599da8677da 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> >> @@ -1000,6 +1000,7 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
> >>       * to WA the Issue
> >>       */
> >>
> >> +    spin_lock(&adev->gmc.invalidate_lock);
> >>      /* TODO: It needs to continue working on debugging with semaphore for GFXHUB as well. */
> >>      if (use_semaphore)
> >>              /* a read return value of 1 means semaphore acuqire */
> >> @@ -1030,6 +1031,7 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
> >>              amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_sem +
> >>                                    hub->eng_distance * eng, 0);
> >>
> >> +    spin_unlock(&adev->gmc.invalidate_lock);
> >>      return pd_addr;
> >>  }
> >>
>