Re: [PATCH 0/5] Add work pool to reset domain

Christian König <christian.koenig@xxxxxxx> · Mon, 14 Aug 2023 13:55:56 +0200

Am 12.08.23 um 19:08 schrieb Lazar, Lijo:
On 8/12/2023 1:53 PM, Christian König wrote:
Am 11.08.23 um 08:02 schrieb Lijo Lazar:
Presently, there are multiple clients of reset like RAS, job 
timeout, KFD hang
detection and debug method. Instead of each client maintaining a 
work item,
reset work pool is moved to reset domain. When a client makes a 
recovery request,
a work item is allocated by the reset domain and queued for 
execution. For the
case of job timeout, each ring has its own TDR queue to which tdr 
work is
scheduled. From there, it's further queued to a reset domain based 
on the device
configuration.

This allows flexibility to have multiple reset domains. For example, 
when
there are partitions, each partition can maintain its own reset 
domain and a job
timeout on one partition doesn't affect jobs on the other partition 
(when the
jobs don't have any interdependency). The reset logic will select the
appropriate reset domain based on the current device configuration.

Well completely NAK to that design.

We intentionally added the workqueue to serialize *all* reset work 
and I absolutely don't see any reason to change that.

This is for the case where there are multiple spatial partitions and a 
reset is possible by hardware design on one partition without 
affecting other partitions on the same device. The partition scenario 
can be considered equivalent to a multi-gpu case (not interconnected 
through XGMI) where each gpu gets its own reset domain and can be 
reset independently.

Well, this is not even remotely correct. Multiple spatial partitions are 
not fully separated, for example they share a common IRQ block.

So you need to be very careful if you want to reset multiple things at 
the same time. Because of this we already rejected the idea you are 
trying to implement here from the SW side.

BTW, this design doesn't restrict from keeping only one reset domain 
as in the case of legacy ASICs like Aldebaran. The reset work is 
always serialized within a domain. This allows to have multiple reset 
domains or you could also fall back to reset_domain1 -> reset_domain2 
for hierarchical resets, if required (though that is not planned now).

Yeah, beside the points noted above this infrastructure here is 
absolutely not necessary. The reset domain is already what you try to add.

In general if you get requirements like this please come to me first. 
I'm the owner of the amdgpu component, so all design regarding the 
kernel module must go over my desk.

Regards,
Christian.

Thanks,
Lijo

Regards,
Christian.

Lijo Lazar (5):
   drm/amdgpu: Add work pool to reset domain
   drm/amdgpu: Move to reset_schedule_work
   drm/amdgpu: Set flags to cancel all pending resets
   drm/amdgpu: Add API to queue and do reset work
   drm/amdgpu: Add TDR queue for ring

  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  32 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |   1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  24 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  40 +++----
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  16 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  71 ++++++------
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 122 
++++++++++++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h  |  32 +++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |   5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |   1 -
  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c      |  38 +++----
  drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c      |  44 ++++----
  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c      |  33 +++---
  15 files changed, 285 insertions(+), 177 deletions(-)