On 2/19/2025 1:30 PM, jesse.zhang@xxxxxxx wrote: > From: "Jesse.zhang@xxxxxxx" <jesse.zhang@xxxxxxx> > > - Modify the VM invalidation engine allocation logic to handle SDMA page rings. > SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of > allocating a separate engine. This change ensures efficient resource management and > avoids the issue of insufficient VM invalidation engines. > > - Add synchronization for GPU TLB flush operations in gmc_v9_0.c. > Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions > during TLB flush operations. This improves the stability and reliability of the driver, > especially in multi-threaded environments. > > replace the sdma ring check with a function `amdgpu_sdma_is_page_queue` > to check if a ring is an SDMA page queue.(Lijo) > > Suggested-by: Lijo Lazar <lijo.lazar@xxxxxxx> > Signed-off-by: Jesse Zhang <jesse.zhang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 +++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 18 ++++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 + > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 ++ > 4 files changed, 28 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > index cb914ce82eb5..da719ec6c6c7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -601,8 +601,15 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device *adev) > return -EINVAL; > } > > + if(amdgpu_sdma_is_page_queue(adev, ring)) { Sorry, didn't mean to exclude the ring type check. BTW, there is another problem. If the previous ring is regular sdma ring, vm_inv_engs[vmhub] &= ~(1 << ring->vm_inv_eng); This step would have modified the bitmap and invalidation engine in the next loop is not the same. What you may want to do is - After allocating sdma ring invalidation engine, assign the same inv engine to page ring corresponding to the sdma instance. ring->vm_inv_eng = inv_eng - 1; if (ring->type == sdma) { page_ring = amdgpu_sdma_get_page_ring(adev, ring->me); => returns &adev->sdma.instance[i].page if (page_ring) page_ring->vm_inv_eng = inv_eng - 1; } vm_inv_engs[vmhub] &= ~(1 << ring->vm_inv_eng); Then skip any page rings in the generic loop. if (ring->type==sdma && amdgpu_sdma_is_page_queue(adev, ring)) continue; - Thanks, Lijo > + /* Do not allocate a separate VM invalidation engine for SDMA page rings. > + * Shared VM invalid engine with sdma gfx ring. > + */ > + ring->vm_inv_eng = inv_eng - 1; > + } else { > ring->vm_inv_eng = inv_eng - 1; > vm_inv_engs[vmhub] &= ~(1 << ring->vm_inv_eng); > + } > > dev_info(adev->dev, "ring %s uses VM inv eng %u on hub %u\n", > ring->name, ring->vm_inv_eng, ring->vm_hub); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > index 8de214a8ba6d..96df544feb67 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c > @@ -503,6 +503,24 @@ void amdgpu_sdma_sysfs_reset_mask_fini(struct amdgpu_device *adev) > } > } > > +/** > +* amdgpu_sdma_is_page_queue - Check if a ring is an SDMA page queue > +* @adev: Pointer to the AMDGPU device structure > +* @ring: Pointer to the ring structure to check > +* > +* This function checks if the given ring is an SDMA page queue. > +* It returns true if the ring is an SDMA page queue, false otherwise. > +*/ > +bool amdgpu_sdma_is_page_queue(struct amdgpu_device *adev, struct amdgpu_ring* ring) > +{ > + int i = ring->me; > + > + if (!adev->sdma.has_page_queue || i >= adev->sdma.num_instances) > + return false; > + > + return (ring == &adev->sdma.instance[i].page); > +} > + > /** > * amdgpu_sdma_register_on_reset_callbacks - Register SDMA reset callbacks > * @funcs: Pointer to the callback structure containing pre_reset and post_reset functions > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > index 7effc2673466..c2df9c3ab882 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h > @@ -194,4 +194,5 @@ int amdgpu_sdma_ras_sw_init(struct amdgpu_device *adev); > void amdgpu_debugfs_sdma_sched_mask_init(struct amdgpu_device *adev); > int amdgpu_sdma_sysfs_reset_mask_init(struct amdgpu_device *adev); > void amdgpu_sdma_sysfs_reset_mask_fini(struct amdgpu_device *adev); > +bool amdgpu_sdma_is_page_queue(struct amdgpu_device *adev, struct amdgpu_ring* ring); > #endif > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > index 2aa87fdf715f..2599da8677da 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > @@ -1000,6 +1000,7 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, > * to WA the Issue > */ > > + spin_lock(&adev->gmc.invalidate_lock); > /* TODO: It needs to continue working on debugging with semaphore for GFXHUB as well. */ > if (use_semaphore) > /* a read return value of 1 means semaphore acuqire */ > @@ -1030,6 +1031,7 @@ static uint64_t gmc_v9_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring, > amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_sem + > hub->eng_distance * eng, 0); > > + spin_unlock(&adev->gmc.invalidate_lock); > return pd_addr; > } >