On 2024-12-12 0:30, Zhu Lingshan wrote: > On 12/12/2024 12:19 PM, Felix Kuehling wrote: >> >> On 2024-12-11 22:06, Zhu Lingshan wrote: >>> kfd_process_wq_release() signals eviction fence by >>> dma_fence_signal() which wanrs if dma_fence >>> is NULL. >> That's news to me. Looking at the dma_fence_signal implementation on amd-staging-drm-next, it just silently returns -EINVAL if the fence pointer is NULL. I see the same in Linux 6.12.4: https://elixir.bootlin.com/linux/v6.12.4/source/drivers/dma-buf/dma-fence.c#L467 >> >> Which branch are you on? > Linus tree, latest master branch, tag v6.13-rc2 > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/dma-fence.c#L467 > > which is introduced by > https://github.com/torvalds/linux/commit/967d226eaae8e40636d257bf8ae55d2c5a912f58 Thank you for that pointer. Please add a Fixes tag to point to that upstream commit. With that, the patch is Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx> > > Thanks > Lingshan > >> >> Regards, >> Felix >> >>> kfd_process->ef is initialized by kfd_process_device_init_vm() >>> through ioctl. That means the fence is NULL for a new >>> created kfd_process, and close a kfd_process right >>> after open it will trigger the warning. >>> >>> This commit conditionally signals the eviction fence >>> in kfd_process_wq_release() only when it is available. >>> >>> [ 503.660882] WARNING: CPU: 0 PID: 9 at drivers/dma-buf/dma-fence.c:467 dma_fence_signal+0x74/0xa0 >>> [ 503.782940] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu] >>> [ 503.789640] RIP: 0010:dma_fence_signal+0x74/0xa0 >>> [ 503.877620] Call Trace: >>> [ 503.880066] <TASK> >>> [ 503.882168] ? __warn+0xcd/0x260 >>> [ 503.885407] ? dma_fence_signal+0x74/0xa0 >>> [ 503.889416] ? report_bug+0x288/0x2d0 >>> [ 503.893089] ? handle_bug+0x53/0xa0 >>> [ 503.896587] ? exc_invalid_op+0x14/0x50 >>> [ 503.900424] ? asm_exc_invalid_op+0x16/0x20 >>> [ 503.904616] ? dma_fence_signal+0x74/0xa0 >>> [ 503.908626] kfd_process_wq_release+0x6b/0x370 [amdgpu] >>> [ 503.914081] process_one_work+0x654/0x10a0 >>> [ 503.918186] worker_thread+0x6c3/0xe70 >>> [ 503.921943] ? srso_alias_return_thunk+0x5/0xfbef5 >>> [ 503.926735] ? srso_alias_return_thunk+0x5/0xfbef5 >>> [ 503.931527] ? __kthread_parkme+0x82/0x140 >>> [ 503.935631] ? __pfx_worker_thread+0x10/0x10 >>> [ 503.939904] kthread+0x2a8/0x380 >>> [ 503.943132] ? __pfx_kthread+0x10/0x10 >>> [ 503.946882] ret_from_fork+0x2d/0x70 >>> [ 503.950458] ? __pfx_kthread+0x10/0x10 >>> [ 503.954210] ret_from_fork_asm+0x1a/0x30 >>> [ 503.958142] </TASK> >>> [ 503.960328] ---[ end trace 0000000000000000 ]--- >>> >>> Signed-off-by: Zhu Lingshan <lingshan.zhu@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> index 87cd52cf4ee9..47d36f43ee8c 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> @@ -1159,7 +1159,8 @@ static void kfd_process_wq_release(struct work_struct *work) >>> */ >>> synchronize_rcu(); >>> ef = rcu_access_pointer(p->ef); >>> - dma_fence_signal(ef); >>> + if (ef) >>> + dma_fence_signal(ef); >>> >>> kfd_process_remove_sysfs(p); >>> >