On 09/14/ , Felix Kuehling wrote: > On 2023-09-14 10:02, Christian König wrote: Do we still need to use legacy flush to emulate heavyweight flush if we don't use SVM? And can I push this now? Regards, Lang > > > > Am 14.09.23 um 15:59 schrieb Felix Kuehling: > > > > > > On 2023-09-14 9:39, Christian König wrote: > > > > Is a single legacy flush sufficient to emulate an heavyweight > > > > flush as well? > > > > > > > > On previous generations we needed to issue at least two legacy > > > > flushes for this. > > > I assume you are referring to the Vega20 XGMI workaround. That is a > > > very different issue. Because PTEs would be cached in L2, we had to > > > always use a heavy-weight flush that would also flush the L2 cache > > > as well, and follow that with another legacy flush to deal with race > > > conditions where stale PTEs could be re-fetched from L2 before the > > > L2 flush was complete. > > > > No, we also have another (badly documented) workaround which issues a > > legacy flush before each heavy weight on some hw generations. See the my > > TLB flush cleanup patches. > > > > > > > > A heavy-weight flush guarantees that there are no more possible > > > memory accesses using the old PTEs. With physically addressed caches > > > on GFXv9 that includes a cache flush because the address translation > > > happened before putting data into the cache. I think the address > > > translation and cache architecture works differently on GFXv10. So > > > maybe the cache-flush is not required here. > > > > > > But even then a legacy flush probably allows for in-flight memory > > > accesses with old physical addresses to complete after the TLB > > > flush. So there is a small risk of memory corruption that was > > > assumed to not be accessed by the GPU any more. Or when using IOMMU > > > device isolation it would result in IOMMU faults if the DMA mappings > > > are invalidated slightly too early. > > > > Mhm, that's quite bad. Any idea how to avoid that? > > A few ideas > > * Add an arbitrary delay and hope that it is longer than the FIFOs in > the HW > * Execute an atomic operation to memory on some GPU engine that could > act as a fence, maybe just a RELEASE_MEM on the CP to some writeback > location would do the job > * If needed, RELEASE_MEM could also perform a cache flush > > Regards, > Felix > > > > > > Regards, > > Christian. > > > > > > > > Regards, > > > Felix > > > > > > > > > > > > > > And please don't push before getting an rb from Felix as well. > > > > > > > > Regards, > > > > Christian. > > > > > > > > > > > > Am 14.09.23 um 11:23 schrieb Lang Yu: > > > > > cyan_skilfish has problems with other flush types. > > > > > > > > > > v2: fix incorrect ternary conditional operator usage.(Yifan) > > > > > > > > > > Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx> > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # v5.15+ > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 7 ++++++- > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > > > > b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > > > > index d3da13f4c80e..c6d11047169a 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c > > > > > @@ -236,7 +236,8 @@ static void > > > > > gmc_v10_0_flush_vm_hub(struct amdgpu_device *adev, uint32_t > > > > > vmid, > > > > > { > > > > > bool use_semaphore = > > > > > gmc_v10_0_use_invalidate_semaphore(adev, vmhub); > > > > > struct amdgpu_vmhub *hub = &adev->vmhub[vmhub]; > > > > > - u32 inv_req = > > > > > hub->vmhub_funcs->get_invalidate_req(vmid, flush_type); > > > > > + u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid, > > > > > + (adev->asic_type != CHIP_CYAN_SKILLFISH) ? > > > > > flush_type : 0); > > > > > u32 tmp; > > > > > /* Use register 17 for GART */ > > > > > const unsigned int eng = 17; > > > > > @@ -331,6 +332,8 @@ static void > > > > > gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t > > > > > vmid, > > > > > int r; > > > > > + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) > > > > > ? flush_type : 0; > > > > > + > > > > > /* flush hdp cache */ > > > > > adev->hdp.funcs->flush_hdp(adev, NULL); > > > > > @@ -426,6 +429,8 @@ static int > > > > > gmc_v10_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev, > > > > > struct amdgpu_ring *ring = &adev->gfx.kiq[0].ring; > > > > > struct amdgpu_kiq *kiq = &adev->gfx.kiq[0]; > > > > > + flush_type = (adev->asic_type != CHIP_CYAN_SKILLFISH) > > > > > ? flush_type : 0; > > > > > + > > > > > if (amdgpu_emu_mode == 0 && ring->sched.ready) { > > > > > spin_lock(&adev->gfx.kiq[0].ring_lock); > > > > > /* 2 dwords flush + 8 dwords fence */ > > > > > >