On 3/20/2024 5:52 PM, Mukul Joshi
wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Destroy the high priority workqueue that handles interrupts during KFD node cleanup. Signed-off-by: Mukul Joshi <mukul.joshi@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c index dd3c43c1ad70..9b6b6e882593 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c @@ -104,6 +104,8 @@ void kfd_interrupt_exit(struct kfd_node *node) */ flush_workqueue(node->ih_wq); + destroy_workqueue(node->ih_wq); +
Here I think we should cancel work items that are still in the work queue, not flush workqueue node->ih_wq. In this case the kfd functions have been terminated, there is no way to handle the left work items. That would make work queue flush never finish. I think it is the reason there are orphan kernel tasks.
After cancel left work items we can call destroy_workqueue.
Regards
Xiaogang
kfifo_free(&node->ih_fifo); } -- 2.35.1