RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - AMD Internal Distribution Only]

Yes, I hit the page fault while doorbell_mode=1. Error log is as follow.

[▒~L 2▒~\~H 25 00:12:10 2025 <    0.000002>] kfd_ioctl_create_event:844: amdgpu: Created event (id:0x00000002) (kfd_ioctl_cree
ate_event)
[▒~L 2▒~\~H 25 00:12:10 2025 <    0.000020>] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:153 vmm
id:0 pasid:0)
[▒~L 2▒~\~H 25 00:12:10 2025 <    0.000123>] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00007f40fb3c1000 frr
om IH client 0x1b (UTCL2)
[▒~L 2▒~\~H 25 00:12:10 2025 <    0.000069>] amdgpu 0000:04:00.0: amdgpu:   cookie node_id 1 fault from die AID0.XCD0

Emily Deng
Best Wishes



>-----Original Message-----
>From: Joshi, Mukul <Mukul.Joshi@xxxxxxx>
>Sent: Tuesday, February 25, 2025 10:45 AM
>To: Deng, Emily <Emily.Deng@xxxxxxx>; Deng, Emily <Emily.Deng@xxxxxxx>;
>Kuehling, Felix <Felix.Kuehling@xxxxxxx>
>Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update queue
>
>[AMD Official Use Only - AMD Internal Distribution Only]
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
>> Deng, Emily
>> Sent: Monday, February 24, 2025 8:05 PM
>> To: Deng, Emily <Emily.Deng@xxxxxxx>; Kuehling, Felix
>> <Felix.Kuehling@xxxxxxx>
>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update
>> queue
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> Ping......
>>
>> >-----Original Message-----
>> >From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
>> >Deng, Emily
>> >Sent: Monday, February 24, 2025 9:53 AM
>> >To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
>> >Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> >Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update
>> >queue
>> >
>> >[AMD Official Use Only - AMD Internal Distribution Only]
>> >
>> >[AMD Official Use Only - AMD Internal Distribution Only]
>> >
>> >Hi Felix,
>> >    Could you help review this? Thanks.
>> >
>> >Emily Deng
>> >Best Wishes
>> >
>> >
>> >
>> >>-----Original Message-----
>> >>From: Deng, Emily <Emily.Deng@xxxxxxx>
>> >>Sent: Friday, February 21, 2025 9:44 AM
>> >>To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> >>Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while
>> >>update queue
>> >>
>> >>[AMD Official Use Only - AMD Internal Distribution Only]
>> >>
>> >>Ping......
>> >>
>> >>Emily Deng
>> >>Best Wishes
>> >>
>> >>
>> >>
>> >>>-----Original Message-----
>> >>>From: Emily Deng <Emily.Deng@xxxxxxx>
>> >>>Sent: Thursday, February 20, 2025 2:25 PM
>> >>>To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> >>>Cc: Deng, Emily <Emily.Deng@xxxxxxx>
>> >>>Subject: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update
>> >>>queue
>> >>>
>> >>>Avoid updating the vmid to 0 during the queue update process, as
>> >>>this may trigger a wptr poll address page fault when a ring
>> >>>doorbell is activated  in
>> >>doorbell_mode=1.
>
>Have you observed this page fault? If you have it, can you please paste the page
>fault backtrace.
>
>> >>>
>> >>>Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
>> >>>---
>> >>> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 4 ++--
>> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> >>>
>> >>>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> >>>b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> >>>index 6b38967d5631..3028c16264b2 100644
>> >>>--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> >>>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> >>>@@ -219,6 +219,8 @@ static void init_mqd(struct mqd_manager *mm,
>> void
>> >**mqd,
>> >>>               m->cp_hqd_wg_state_offset = q->ctl_stack_size;
>> >>>       }
>> >>>
>> >>>+      m->cp_hqd_vmid = q->vmid;
>> >>>+
>
>q->vmid would be 0 at the time of calling init_mqd when using HW
>q->scheduler as it's the
>HWS which assigns the VMID.
>Driver only assigns VMID when there is no HWS, which is not a production use-case.
>
>> >>>       *mqd = m;
>> >>>       if (gart_addr)
>> >>>               *gart_addr = addr;
>> >>>@@ -288,8 +290,6 @@ static void update_mqd(struct mqd_manager
>> *mm,
>> >>>void *mqd,
>> >>>
>> >>>       m->cp_hqd_iq_timer = 0;
>> >>>
>> >>>-      m->cp_hqd_vmid = q->vmid;
>
>Maybe we can just remove his vmid assignment if this is indeed causing a page
>fault.
>But I haven't seen a page fault because of this before.
>
>Regards,
>Mukul
>
>> >>>-
>> >>>       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>> >>>               m->cp_hqd_pq_control |=
>> >>>CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
>> >>>                               2 <<
>> >>>CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
>> >>>--
>> >>>2.36.1
>> >>
>





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux