Re: [PATCH 00/10] Support XGMI reset on init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 02.09.24 um 09:34 schrieb Lijo Lazar:
There are case where a device needs to be reset first before it is fully
initialized. An example case is a driver reinstallation with a different version
of PSP TOS. In such a case, if a device supports reset in which PSP TOS is
unloaded, then driver needs to reset device first and then load the new firmware
components.

For devices in an XGMI hive, a reset needs to be sent on all devices in the
hive. Thus driver should discover first devices that belong to a hive with
PSP support.

There is an existing delayed reset handler, however it has the below
limitations-
1) It doesn't discover devices in the hive, instead it tries to do XGMI reset
for all devices registered to mgpu struct. mgpu struct may have other devices
than the one which belong to a hive. Also, if there is more than one hive, it
doesn't work.
2) It doesn't take a reset lock and since this is a delayed reset, that could
result in unwanted hardware accesses during a reset.
3) It doesn't initialize RAS properly (left as TODO)

This series overcomes the above limitations. Instead of marking a pending reset,
init levels are defined where the level of initialization may be defined. In
case of a pending reset, only specific hardware blocks may be initialized.

Further work (not done in this series) may be done to have fine grain controls
for init levels - say skip enabling features like DPM enablement, or skip
loading specific set of fimwares as they won't be required during a minimal init
scenario where device is going to be reset.

The series adds an API interface to check if a PSP TOS reload is required.

At least from the high level that sounds totally sane, but I have no idea where to get time from to review the details.

I need to discuss that with Alex and/or Tim. Maybe I can delegate some more work.

Christian.



Lijo Lazar (10):
   drm/amdgpu: Add init levels
   drm/amdgpu: Use init level for pending_reset flag
   drm/amdgpu: Separate reinitialization after reset
   drm/amdgpu: Add reset on init handler for XGMI
   drm/amdgpu: Add helper to initialize badpage info
   drm/amdgpu: Refactor XGMI reset on init handling
   drm/amdgpu: Drop delayed reset work handler
   drm/amdgpu: Support reset-on-init on select SOCs
   drm/amdgpu: Add interface for TOS reload cases
   drm/amdgpu: Add PSP reload case to reset-on-init

  drivers/gpu/drm/amd/amdgpu/aldebaran.c        |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  21 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 245 +++++++++++-------
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  81 ------
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h       |   1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |  13 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h       |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       |  62 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h       |   4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c     | 148 +++++++++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h     |   4 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c      |  72 ++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h      |   2 +
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |  14 +-
  drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  25 ++
  drivers/gpu/drm/amd/amdgpu/soc15.c            |   7 +
  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |   3 +-
  17 files changed, 492 insertions(+), 214 deletions(-)





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux