[AMD Official Use Only - AMD Internal Distribution Only] Yes, I hit the page fault while doorbell_mode=1. Error log is as follow. [▒~L 2▒~\~H 25 00:12:10 2025 < 0.000002>] kfd_ioctl_create_event:844: amdgpu: Created event (id:0x00000002) (kfd_ioctl_cree ate_event) [▒~L 2▒~\~H 25 00:12:10 2025 < 0.000020>] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:153 vmm id:0 pasid:0) [▒~L 2▒~\~H 25 00:12:10 2025 < 0.000123>] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00007f40fb3c1000 frr om IH client 0x1b (UTCL2) [▒~L 2▒~\~H 25 00:12:10 2025 < 0.000069>] amdgpu 0000:04:00.0: amdgpu: cookie node_id 1 fault from die AID0.XCD0 Emily Deng Best Wishes >-----Original Message----- >From: Joshi, Mukul <Mukul.Joshi@xxxxxxx> >Sent: Tuesday, February 25, 2025 10:45 AM >To: Deng, Emily <Emily.Deng@xxxxxxx>; Deng, Emily <Emily.Deng@xxxxxxx>; >Kuehling, Felix <Felix.Kuehling@xxxxxxx> >Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update queue > >[AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >> Deng, Emily >> Sent: Monday, February 24, 2025 8:05 PM >> To: Deng, Emily <Emily.Deng@xxxxxxx>; Kuehling, Felix >> <Felix.Kuehling@xxxxxxx> >> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update >> queue >> >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> Ping...... >> >> >-----Original Message----- >> >From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >> >Deng, Emily >> >Sent: Monday, February 24, 2025 9:53 AM >> >To: Kuehling, Felix <Felix.Kuehling@xxxxxxx> >> >Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> >Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update >> >queue >> > >> >[AMD Official Use Only - AMD Internal Distribution Only] >> > >> >[AMD Official Use Only - AMD Internal Distribution Only] >> > >> >Hi Felix, >> > Could you help review this? Thanks. >> > >> >Emily Deng >> >Best Wishes >> > >> > >> > >> >>-----Original Message----- >> >>From: Deng, Emily <Emily.Deng@xxxxxxx> >> >>Sent: Friday, February 21, 2025 9:44 AM >> >>To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> >>Subject: RE: [PATCH 3/3] drm/amdkfd: Skip update vmid in while >> >>update queue >> >> >> >>[AMD Official Use Only - AMD Internal Distribution Only] >> >> >> >>Ping...... >> >> >> >>Emily Deng >> >>Best Wishes >> >> >> >> >> >> >> >>>-----Original Message----- >> >>>From: Emily Deng <Emily.Deng@xxxxxxx> >> >>>Sent: Thursday, February 20, 2025 2:25 PM >> >>>To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> >>>Cc: Deng, Emily <Emily.Deng@xxxxxxx> >> >>>Subject: [PATCH 3/3] drm/amdkfd: Skip update vmid in while update >> >>>queue >> >>> >> >>>Avoid updating the vmid to 0 during the queue update process, as >> >>>this may trigger a wptr poll address page fault when a ring >> >>>doorbell is activated in >> >>doorbell_mode=1. > >Have you observed this page fault? If you have it, can you please paste the page >fault backtrace. > >> >>> >> >>>Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx> >> >>>--- >> >>> drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 4 ++-- >> >>> 1 file changed, 2 insertions(+), 2 deletions(-) >> >>> >> >>>diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c >> >>>b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c >> >>>index 6b38967d5631..3028c16264b2 100644 >> >>>--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c >> >>>+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c >> >>>@@ -219,6 +219,8 @@ static void init_mqd(struct mqd_manager *mm, >> void >> >**mqd, >> >>> m->cp_hqd_wg_state_offset = q->ctl_stack_size; >> >>> } >> >>> >> >>>+ m->cp_hqd_vmid = q->vmid; >> >>>+ > >q->vmid would be 0 at the time of calling init_mqd when using HW >q->scheduler as it's the >HWS which assigns the VMID. >Driver only assigns VMID when there is no HWS, which is not a production use-case. > >> >>> *mqd = m; >> >>> if (gart_addr) >> >>> *gart_addr = addr; >> >>>@@ -288,8 +290,6 @@ static void update_mqd(struct mqd_manager >> *mm, >> >>>void *mqd, >> >>> >> >>> m->cp_hqd_iq_timer = 0; >> >>> >> >>>- m->cp_hqd_vmid = q->vmid; > >Maybe we can just remove his vmid assignment if this is indeed causing a page >fault. >But I haven't seen a page fault because of this before. > >Regards, >Mukul > >> >>>- >> >>> if (q->format == KFD_QUEUE_FORMAT_AQL) { >> >>> m->cp_hqd_pq_control |= >> >>>CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK | >> >>> 2 << >> >>>CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT | >> >>>-- >> >>>2.36.1 >> >> >