RE: [PATCH 02/23] drm/amdgpu: do RAS init in NPS mode switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - AMD Internal Distribution Only]

In nps mode is somehow confusing. We'd like to differentiate recovery (*after* reset) from regular initialization.

Is it possible to replace in nps mode check with more general approach? In regular initialization, set ras interface available in ip late init, while in recovery, let the flag set when recovery is completed.

Regards,
Hawking

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Tao Zhou
Sent: Friday, November 8, 2024 7:14 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Zhou1, Tao <Tao.Zhou1@xxxxxxx>
Subject: [PATCH 02/23] drm/amdgpu: do RAS init in NPS mode switch

NPS mode switch will call gpu reset, but this is different from normal reset.

Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 11 +++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d69fcbb28b0e..635f020f8d9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3293,7 +3293,7 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
                return r;
        }

-       if (!amdgpu_in_reset(adev))
+       if (!amdgpu_in_reset(adev) || amdgpu_in_nps_switch(adev))
                amdgpu_ras_set_error_query_ready(adev, true);

        amdgpu_device_set_cg_state(adev, AMD_CG_STATE_GATE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index de1a55ae1d78..cbecf2380b51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1253,7 +1253,8 @@ int amdgpu_ras_bind_aca(struct amdgpu_device *adev, enum amdgpu_ras_block blk,
        struct ras_manager *obj;

        /* in resume phase, no need to create aca fs node */
-       if (adev->in_suspend || amdgpu_in_reset(adev))
+       if (adev->in_suspend ||
+           (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev)))
                return 0;

        obj = get_ras_manager(adev, blk);
@@ -3780,7 +3781,8 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev,

        r = amdgpu_ras_feature_enable_on_boot(adev, ras_block, 1);
        if (r) {
-               if (adev->in_suspend || amdgpu_in_reset(adev)) {
+               if (adev->in_suspend ||
+                   (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev))) {
                        /* in resume phase, if fail to enable ras,
                         * clean up all ras fs nodes, and disable ras */
                        goto cleanup;
@@ -3792,7 +3794,8 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev,
        amdgpu_persistent_edc_harvesting(adev, ras_block);

        /* in resume phase, no need to create ras fs node */
-       if (adev->in_suspend || amdgpu_in_reset(adev))
+       if (adev->in_suspend ||
+           (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev)))
                return 0;

        ras_obj = container_of(ras_block, struct amdgpu_ras_block_object, ras_comm); @@ -3922,7 +3925,7 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev)
        amdgpu_ras_event_mgr_init(adev);

        if (amdgpu_ras_aca_is_supported(adev)) {
-               if (amdgpu_in_reset(adev)) {
+               if (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev)) {
                        if (amdgpu_aca_is_enabled(adev))
                                r = amdgpu_aca_reset(adev);
                        else
--
2.34.1





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux