[AMD Official Use Only] > -----Original Message----- > From: Paul Menzel <pmenzel@xxxxxxxxxxxxx> > Sent: Monday, March 21, 2022 6:47 PM > To: Zhou1, Tao <Tao.Zhou1@xxxxxxx> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking > <Hawking.Zhang@xxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Yang, > Stanley <Stanley.Yang@xxxxxxx>; Chai, Thomas <YiPeng.Chai@xxxxxxx> > Subject: Re: [PATCH] drm/amdkfd: print unmap queue status for RAS poison > consumption (v2) > > Dear Tao, > > > Thank you for the patch. > > > Am 21.03.22 um 10:38 schrieb Tao Zhou: > > Print the status out when it passes, and also tell user gpu reset is > > triggered when we fallback to legacy way. > > > > v2: make the message more explicitly. > > > > Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > index 56902b5bb7b6..32c451f21db7 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > @@ -105,8 +105,6 @@ static void > event_interrupt_poison_consumption(struct kfd_dev *dev, > > if (old_poison) > > return; > > > > - pr_warn("RAS poison consumption handling: client id %d\n", client_id); > > - > > switch (client_id) { > > case SOC15_IH_CLIENTID_SE0SH: > > case SOC15_IH_CLIENTID_SE1SH: > > @@ -130,10 +128,15 @@ static void > event_interrupt_poison_consumption(struct kfd_dev *dev, > > /* resetting queue passes, do page retirement without gpu reset > > * resetting queue fails, fallback to gpu reset solution > > */ > > - if (!ret) > > + if (!ret) { > > + pr_warn("RAS poison consumption, unmap queue flow succeeds: > client id %d\n", > > + client_id); > > succeeded? As it’s a success message, should it be an informational message? [Tao] thanks, will change to use succeeded before push. Although it reports success, poison consumption is not a usual event. > > > amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev, > false); > > - else > > + } else { > > + pr_warn("RAS poison consumption, fallback to gpu reset flow: > client > > +id %d\n", > > Fall back. > > > + client_id); > > amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev, > true); > > Could the log be moved somehow to the handler? [Tao] Could not. Unmap queue isn’t called in the handler and client_id isn't transferred to the handler. > > > + } > > } > > > > static bool event_interrupt_isr_v9(struct kfd_dev *dev, > > Unrelated to the patch, at least I as user, would wish these warnings to be more > elaborate, telling me, what the problem is, what effects it has, and what to do > to fix it. [Tao] It's difficult. You need a document instead of dmesg log to tell you all the details. > > > Kind regards, > > Paul