Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]


Hi Alex,

The following two patches were introduced for stable@xxxxxxxxxxxxxxx
714d9e4 drm/amdgpu: init iommu after amdkfd device init
f02abeb drm/amdgpu: move iommu_resume before ip init/resume
after commit   970eae15600a883e4ad27dd0757b18871cc983ab
Merge: 27f4432 3906fe9    BackMerge tag 'v5.15-rc7' into drm-next,
It became redundant and overwrote afd1818.

I saw that you just submit (afd1818) "[PATCH] drm/amdkfd: fix boot failure when iommu is disabled in Picasso" to stable@xxxxxxxxxxxxxxx.

I checked that if we re-applied afd1818 on current drm-next, it did the same thing as my patch after auto-merged.

I am wondering if BackMerge stable into drm-next in the future will correct current break.

For the above situation, I am not sure what is the proper way to fix this break.

Please let me know your final decision with all these information.
      

Thanks & Best Regards!


James Zhu


From: Alex Deucher <alexdeucher@xxxxxxxxx>
Sent: Wednesday, November 3, 2021 11:03 AM
To: Zhu, James <James.Zhu@xxxxxxx>
Cc: amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Yifan <Yifan1.Zhang@xxxxxxx>; James Zhu <jzhums@xxxxxxxxx>; Ken Moffat <zarniwhoop@xxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
 
Reverting 714d9e4 and  f02abeb results in this diff which is more than this patch does.  Is that correct or should I just use your patch?

Alex

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..70540712ff2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (r)
                goto init_failed;
 
+       r = amdgpu_amdkfd_resume_iommu(adev);
+       if (r)
+               goto init_failed;
+
        r = amdgpu_device_ip_hw_init_phase1(adev);
        if (r)
                goto init_failed;
@@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (!adev->gmc.xgmi.pending_reset)
                amdgpu_amdkfd_device_init(adev);
 
-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               goto init_failed;
-
        amdgpu_fru_get_product_info(adev);
 
 init_failed:
@@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
 {
        int r;
 
-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               return r;
-
        r = amdgpu_device_ip_resume_phase1(adev);
        if (r)
                return r;
@@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
                                dev_warn(tmp_adev->dev, "asic atom init failed!");
                        } else {
                                dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
-                               r = amdgpu_amdkfd_resume_iommu(tmp_adev);
-                               if (r)
-                                       goto out;
-
                                r = amdgpu_device_ip_resume_phase1(tmp_adev);
                                if (r)
                                        goto out;


On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:


On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@xxxxxxx> wrote:

[AMD Official Use Only]


Hi Alex,

Finally figured out the root cause for this broken,

Linux 5.14.15  + afd1818 can fix the issue.

I'll do that for stable.
 
Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.

714d9e4 drm/amdgpu: init iommu after amdkfd device init

f02abeb drm/amdgpu: move iommu_resume before ip init/resume

afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

286826d drm/amdgpu: init iommu after amdkfd device init

9cec53c drm/amdgpu: move iommu_resume before ip init/resume


So, do we just discard this patch, and revert 714d9e4 and  f02abeb?


I'll do that for 5.15+

Thanks for sorting this out.

Alex
 


Thanks & Best Regards!


James Zhu


From: Alex Deucher <alexdeucher@xxxxxxxxx>
Sent: Tuesday, November 2, 2021 10:01 PM
To: Zhu, James <James.Zhu@xxxxxxx>
Cc: amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Yifan <Yifan1.Zhang@xxxxxxx>; James Zhu <jzhums@xxxxxxxxx>; Ken Moffat <zarniwhoop@xxxxxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
 
On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@xxxxxxx> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@xxxxxxx>

Once you get confirmation, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url="">
Bug:
https://nam11.safelinks.protection.outlook.com/?url="">

Acked-by: Alex Deucher <
alexander.deucher@xxxxxxx>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux