Re: [PATCH v2] drm/amdgpu: always use legacy tlb flush on cyan_skilfish

Lang Yu <Lang.Yu@xxxxxxx> · Fri, 15 Sep 2023 16:53:45 +0800

On 09/14/ , Felix Kuehling wrote:
> On 2023-09-14 10:02, Christian König wrote:

Do we still need to use legacy flush to emulate heavyweight flush
if we don't use SVM? And can I push this now?

Regards,
Lang

> > 
> > Am 14.09.23 um 15:59 schrieb Felix Kuehling:
> > > 
> > > On 2023-09-14 9:39, Christian König wrote:
> > > > Is a single legacy flush sufficient to emulate an heavyweight
> > > > flush as well?
> > > > 
> > > > On previous generations we needed to issue at least two legacy
> > > > flushes for this.
> > > I assume you are referring to the Vega20 XGMI workaround. That is a
> > > very different issue. Because PTEs would be cached in L2, we had to
> > > always use a heavy-weight flush that would also flush the L2 cache
> > > as well, and follow that with another legacy flush to deal with race
> > > conditions where stale PTEs could be re-fetched from L2 before the
> > > L2 flush was complete.
> > 
> > No, we also have another (badly documented) workaround which issues a
> > legacy flush before each heavy weight on some hw generations. See the my
> > TLB flush cleanup patches.
> > 
> > > 
> > > A heavy-weight flush guarantees that there are no more possible
> > > memory accesses using the old PTEs. With physically addressed caches
> > > on GFXv9 that includes a cache flush because the address translation
> > > happened before putting data into the cache. I think the address
> > > translation and cache architecture works differently on GFXv10. So
> > > maybe the cache-flush is not required here.
> > > 
> > > But even then a legacy flush probably allows for in-flight memory
> > > accesses with old physical addresses to complete after the TLB
> > > flush. So there is a small risk of memory corruption that was
> > > assumed to not be accessed by the GPU any more. Or when using IOMMU
> > > device isolation it would result in IOMMU faults if the DMA mappings
> > > are invalidated slightly too early.
> > 
> > Mhm, that's quite bad. Any idea how to avoid that?
> 
> A few ideas
> 
>  * Add an arbitrary delay and hope that it is longer than the FIFOs in
>    the HW
>  * Execute an atomic operation to memory on some GPU engine that could
>    act as a fence, maybe just a RELEASE_MEM on the CP to some writeback
>    location would do the job
>  * If needed, RELEASE_MEM could also perform a cache flush
> 
> Regards,
>   Felix
> 
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Regards,
> > >   Felix
> > > 
> > > 
> > > > 
> > > > And please don't push before getting an rb from Felix as well.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > 
> > > > Am 14.09.23 um 11:23 schrieb Lang Yu:
> > > > > cyan_skilfish has problems with other flush types.
> > > > > 
> > > > > v2: fix incorrect ternary conditional operator usage.(Yifan)
> > > > > 
> > > > > Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
> > > > > Cc: <stable@xxxxxxxxxxxxxxx> # v5.15+
> > > > > ---
> > > > >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++-
> > > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > index d3da13f4c80e..c6d11047169a 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > > > > @@ -236,7 +236,8 @@ static void
> > > > > gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t
> > > > > vmid,
> > > > >   {
> > > > >       bool use_semaphore =
> > > > > gmc_v10_0_use_invalidate_semaphore(adev, vmhub);
> > > > >       struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
> > > > > -    u32 inv_req =
> > > > > hub->vmhub_funcs->get_invalidate_req(vmid, flush_type);
> > > > > +    u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid,
> > > > > +              (adev->asic_type != CHIP_CYAN_SKILLFISH) ?
> > > > > flush_type : 0);
> > > > >       u32 tmp;
> > > > >       /* Use register 17 for GART */
> > > > >       const unsigned int eng = 17;
> > > > > @@ -331,6 +332,8 @@ static void
> > > > > gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t
> > > > > vmid,
> > > > >         int r;
> > > > >   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
> > > > > ? flush_type : 0;
> > > > > +
> > > > >       /* flush hdp cache */
> > > > >       adev->hdp.funcs->flush_hdp(adev, NULL);
> > > > >   @@ -426,6 +429,8 @@ static int
> > > > > gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
> > > > >       struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring;
> > > > >       struct amdgpu_kiq *kiq = &adev->gfx.kiq[0];
> > > > >   +    flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH)
> > > > > ? flush_type : 0;
> > > > > +
> > > > >       if (amdgpu_emu_mode == 0 && ring->sched.ready) {
> > > > >           spin_lock(&adev->gfx.kiq[0].ring_lock);
> > > > >           /* 2 dwords flush + 8 dwords fence */
> > > > 
> >