I'm trying to understand what the end result is after James' and your patches. If I'm reading it correctly, we now initialize IOMMUv2 after kfd_device_init during initialization, but before kfd_device_init during resume from S3. Is that the correct understanding? My concern is, that we may run into related problems again if the root cause and the mechanism of the fix are poorly understood. Do you have an explanation why this different sequence is needed during init and resume? That said, given that your patches are tested and reviewed by James, you can add my Acked-by to both patches. The series is Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> Thanks, Felix Am 2021-10-04 um 5:42 a.m. schrieb Zhang, Yifan: > > [AMD Official Use Only] > > > > Hi Felix, > > > > After sync w/ James, we agree that this patch series could fix both > our problems, and he verified this patch series will not cause > regression of his previous issue. Do you have more comments regarding > this patch series ? Thanks. > > > > BRs, > > Yifan > > > > *From:* Zhu, James <James.Zhu@xxxxxxx> > *Sent:* Wednesday, September 29, 2021 9:19 PM > *To:* Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Zhang, Yifan > <Yifan1.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init > > > > [AMD Official Use Only] > > > > H Felix, > > > > Since the previous patch can help on PCO suspend/resume hung issue. > Let me work with YiFan to see if > > there is proper way to cover both cases. > > > > Thanks & Best Regards! > > > > James Zhu > > ------------------------------------------------------------------------ > > *From:*Kuehling, Felix <Felix.Kuehling@xxxxxxx > <mailto:Felix.Kuehling@xxxxxxx>> > *Sent:* Tuesday, September 28, 2021 11:41 AM > *To:* Zhang, Yifan <Yifan1.Zhang@xxxxxxx > <mailto:Yifan1.Zhang@xxxxxxx>>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd-gfx@xxxxxxxxxxxxxxxxxxxxx > <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>>; Zhu, James <James.Zhu@xxxxxxx > <mailto:James.Zhu@xxxxxxx>> > *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device > init > > > > [+James] > > This basically undoes James's change "drm/amdgpu: move iommu_resume > before ip init/resume". I assume James made his change for a reason. Can > you please discuss the issue with him and determine a solution that > solves both your problem and his? > > If James' patch series was a mistake, I'd prefer to revert his patches, > because his patches complicated the initialization sequence and exposed > the iommu init sequence in amdgpu. > > Thanks, > Felix > > > Am 2021-09-28 um 4:28 a.m. schrieb Yifan Zhang: > > This patch is to fix clinfo failure in Raven/Picasso: > > > > Number of platforms: 1 > > Platform Profile: FULL_PROFILE > > Platform Version: OpenCL 2.2 AMD-APP (3364.0) > > Platform Name: AMD Accelerated Parallel Processing > > Platform Vendor: Advanced Micro Devices, Inc. > > Platform Extensions: cl_khr_icd cl_amd_event_callback > > > > Platform Name: AMD Accelerated Parallel Processing Number of > devices: 0 > > > > Signed-off-by: Yifan Zhang <yifan1.zhang@xxxxxxx > <mailto:yifan1.zhang@xxxxxxx>> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 4c8f2f4647c0..89ed9b091386 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -2393,10 +2393,6 @@ static int amdgpu_device_ip_init(struct > amdgpu_device *adev) > > if (r) > > goto init_failed; > > > > - r = amdgpu_amdkfd_resume_iommu(adev); > > - if (r) > > - goto init_failed; > > - > > r = amdgpu_device_ip_hw_init_phase1(adev); > > if (r) > > goto init_failed; > > @@ -2435,6 +2431,10 @@ static int amdgpu_device_ip_init(struct > amdgpu_device *adev) > > if (!adev->gmc.xgmi.pending_reset) > > amdgpu_amdkfd_device_init(adev); > > > > + r = amdgpu_amdkfd_resume_iommu(adev); > > + if (r) > > + goto init_failed; > > + > > amdgpu_fru_get_product_info(adev); > > > > init_failed: >