Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]


Hi Flelix,

I have tested YiFan's patch and pco s3 test can pass.

The original s3 issue always occurred on resume, And YiFan's patches
are only related to initialization.

Hi YiFan,

Reviewed-by: James Zhu <James.Zhu@xxxxxxx> for the series
Tested-by: James Zhu <James.Zhu@xxxxxxx> for the series


Thanks & Best Regards!


James Zhu


From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Sent: Monday, October 4, 2021 11:00 AM
To: Zhang, Yifan <Yifan1.Zhang@xxxxxxx>; Zhu, James <James.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init
 
I'm trying to understand what the end result is after James' and your
patches. If I'm reading it correctly, we now initialize IOMMUv2 after
kfd_device_init during initialization, but before kfd_device_init during
resume from S3. Is  that the correct understanding?

My concern is, that we may run into related problems again if the root
cause and the mechanism of the fix are poorly understood. Do you have an
explanation why this different sequence is needed during init and resume?

That said, given that your patches are tested and reviewed by James, you
can add my Acked-by to both patches. The series is

Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>

Thanks,
  Felix


Am 2021-10-04 um 5:42 a.m. schrieb Zhang, Yifan:
>
> [AMD Official Use Only]
>
>  
>
> Hi Felix,
>
>  
>
> After sync w/ James, we agree that this patch series could fix both
> our problems, and he verified this patch series will not cause
> regression of his previous issue. Do you have more comments regarding
> this patch series ? Thanks.
>
>  
>
> BRs,
>
> Yifan
>
>  
>
> *From:* Zhu, James <James.Zhu@xxxxxxx>
> *Sent:* Wednesday, September 29, 2021 9:19 PM
> *To:* Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Zhang, Yifan
> <Yifan1.Zhang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device init
>
>  
>
> [AMD Official Use Only]
>
>  
>
> H Felix,
>
>  
>
> Since the previous patch can help on PCO suspend/resume hung issue.
> Let me work with YiFan to see if
>
> there is proper way to cover both cases.
>
>  
>
> Thanks & Best Regards!
>
>  
>
> James Zhu
>
> ------------------------------------------------------------------------
>
> *From:*Kuehling, Felix <Felix.Kuehling@xxxxxxx
> <mailto:Felix.Kuehling@xxxxxxx>>
> *Sent:* Tuesday, September 28, 2021 11:41 AM
> *To:* Zhang, Yifan <Yifan1.Zhang@xxxxxxx
> <mailto:Yifan1.Zhang@xxxxxxx>>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx> <amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> <mailto:amd-gfx@xxxxxxxxxxxxxxxxxxxxx>>; Zhu, James <James.Zhu@xxxxxxx
> <mailto:James.Zhu@xxxxxxx>>
> *Subject:* Re: [PATCH 2/2] drm/amdgpu: init iommu after amdkfd device
> init
>
>  
>
> [+James]
>
> This basically undoes James's change "drm/amdgpu: move iommu_resume
> before ip init/resume". I assume James made his change for a reason. Can
> you please discuss the issue with him and determine a solution that
> solves both your problem and his?
>
> If James' patch series was a mistake, I'd prefer to revert his patches,
> because his patches complicated the initialization sequence and exposed
> the iommu init sequence in amdgpu.
>
> Thanks,
>   Felix
>
>
> Am 2021-09-28 um 4:28 a.m. schrieb Yifan Zhang:
> > This patch is to fix clinfo failure in Raven/Picasso:
> >
> > Number of platforms: 1
> >   Platform Profile: FULL_PROFILE
> >   Platform Version: OpenCL 2.2 AMD-APP (3364.0)
> >   Platform Name: AMD Accelerated Parallel Processing
> >   Platform Vendor: Advanced Micro Devices, Inc.
> >   Platform Extensions: cl_khr_icd cl_amd_event_callback
> >
> >   Platform Name: AMD Accelerated Parallel Processing Number of
> devices: 0
> >
> > Signed-off-by: Yifan Zhang <yifan1.zhang@xxxxxxx
> <mailto:yifan1.zhang@xxxxxxx>>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 4c8f2f4647c0..89ed9b091386 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2393,10 +2393,6 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >        if (r)
> >                goto init_failed;
> > 
> > -     r = amdgpu_amdkfd_resume_iommu(adev);
> > -     if (r)
> > -             goto init_failed;
> > -
> >        r = amdgpu_device_ip_hw_init_phase1(adev);
> >        if (r)
> >                goto init_failed;
> > @@ -2435,6 +2431,10 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >        if (!adev->gmc.xgmi.pending_reset)
> >                amdgpu_amdkfd_device_init(adev);
> > 
> > +     r = amdgpu_amdkfd_resume_iommu(adev);
> > +     if (r)
> > +             goto init_failed;
> > +
> >        amdgpu_fru_get_product_info(adev);
> > 
> >  init_failed:
>

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux