On Wed, Nov 6, 2024 at 1:49 AM Victor Zhao <Victor.Zhao@xxxxxxx> wrote: > > From: Monk Liu <Monk.Liu@xxxxxxx> > > As cache GTT buffer is snooped, this way the coherence between CPU write > and GPU fetch is guaranteed, but original code uses WC + unsnooped for > HIQ PQ(ring buffer) which introduces coherency issues: > MEC fetches a stall data from PQ and leads to MEC hang. Can you elaborate on this? I can see CPU reads being slower because the memory is uncached, but the ring buffer is mostly writes anyway. IIRC, the driver uses USWC for most if not all of the other ring buffers managed by the kernel. Why aren't those a problem? Alex > > Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index 1f1d79ac5e6c..fb087a0ff5bc 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -779,7 +779,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > if (amdgpu_amdkfd_alloc_gtt_mem( > kfd->adev, size, &kfd->gtt_mem, > &kfd->gtt_start_gpu_addr, &kfd->gtt_start_cpu_ptr, > - false, true)) { > + false, false)) { > dev_err(kfd_device, "Could not allocate %d bytes\n", size); > goto alloc_gtt_mem_failure; > } > -- > 2.34.1 >