Am 07.11.24 um 06:58 schrieb Lazar, Lijo:
On 11/6/2024 8:42 PM, Alex Deucher wrote:
On Wed, Nov 6, 2024 at 1:49 AM Victor Zhao <Victor.Zhao@xxxxxxx> wrote:
From: Monk Liu <Monk.Liu@xxxxxxx>
As cache GTT buffer is snooped, this way the coherence between CPU write
and GPU fetch is guaranteed, but original code uses WC + unsnooped for
HIQ PQ(ring buffer) which introduces coherency issues:
MEC fetches a stall data from PQ and leads to MEC hang.
Can you elaborate on this? I can see CPU reads being slower because
the memory is uncached, but the ring buffer is mostly writes anyway.
IIRC, the driver uses USWC for most if not all of the other ring
buffers managed by the kernel. Why aren't those a problem?
We have this on other rings -
mb();
amdgpu_ring_set_wptr(ring);
I think the solution should be to use barrier before write pointer
updates rather than relying on PCIe snooping.
Yeah, completely agree as well. The barrier also takes care of
preventing the compiler from re-ordering writes.
Regards,
Christian.
Thanks,
Lijo
Alex
Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 1f1d79ac5e6c..fb087a0ff5bc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -779,7 +779,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
if (amdgpu_amdkfd_alloc_gtt_mem(
kfd->adev, size, &kfd->gtt_mem,
&kfd->gtt_start_gpu_addr, &kfd->gtt_start_cpu_ptr,
- false, true)) {
+ false, false)) {
dev_err(kfd_device, "Could not allocate %d bytes\n", size);
goto alloc_gtt_mem_failure;
}
--
2.34.1