On 11/6/2024 8:42 PM, Alex Deucher wrote: > On Wed, Nov 6, 2024 at 1:49 AM Victor Zhao <Victor.Zhao@xxxxxxx> wrote: >> >> From: Monk Liu <Monk.Liu@xxxxxxx> >> >> As cache GTT buffer is snooped, this way the coherence between CPU write >> and GPU fetch is guaranteed, but original code uses WC + unsnooped for >> HIQ PQ(ring buffer) which introduces coherency issues: >> MEC fetches a stall data from PQ and leads to MEC hang. > > Can you elaborate on this? I can see CPU reads being slower because > the memory is uncached, but the ring buffer is mostly writes anyway. > IIRC, the driver uses USWC for most if not all of the other ring > buffers managed by the kernel. Why aren't those a problem? We have this on other rings - mb(); amdgpu_ring_set_wptr(ring); I think the solution should be to use barrier before write pointer updates rather than relying on PCIe snooping. Thanks, Lijo > > Alex > >> >> Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >> index 1f1d79ac5e6c..fb087a0ff5bc 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >> @@ -779,7 +779,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, >> if (amdgpu_amdkfd_alloc_gtt_mem( >> kfd->adev, size, &kfd->gtt_mem, >> &kfd->gtt_start_gpu_addr, &kfd->gtt_start_cpu_ptr, >> - false, true)) { >> + false, false)) { >> dev_err(kfd_device, "Could not allocate %d bytes\n", size); >> goto alloc_gtt_mem_failure; >> } >> -- >> 2.34.1 >>