Am 11.03.25 um 09:32 schrieb Jesse.zhang@xxxxxxx: > From: "Jesse.zhang@xxxxxxx" <Jesse.zhang@xxxxxxx> > > This patch introduces two new callbacks, `stop_queue` and `start_queue`, to the > `amdgpu_ring_funcs` structure. These callbacks are designed to handle the stopping > and starting of SDMA queues during engine reset operations. The changes include: > > 1. **Addition of Callbacks**: > - Added `stop_queue` and `start_queue` function pointers to `amdgpu_ring_funcs`. > - These callbacks allow for modular and flexible management of SDMA queues during > reset operations. Why does that needs to be per ring callbacks? Flexibility is usually something bad when not needed. Regards, Christian. > > 2. **Integration with SDMA v4.4.2**: > - Implemented `sdma_v4_4_2_stop_queue` and `sdma_v4_4_2_restore_queue` as the > respective callback functions for SDMA v4.4.2. > - These functions handle the stopping and starting of SDMA queues, ensuring that > the scheduler's work queue is properly managed during resets. > > 3. **Purpose**: > - The new callbacks provide a standardized way to stop and start SDMA queues, > which is essential for handling engine resets gracefully. > - This change simplifies the reset logic and improves maintainability by > centralizing queue management in the `amdgpu_ring_funcs` structure. > > 4. **Impact**: > - The addition of these callbacks ensures that SDMA queues are properly stopped > and started during reset operations, reducing the risk of race conditions and > improving the reliability of the reset process. > - This change is a prerequisite for future improvements to the SDMA reset logic, > including better coordination between the KGD and KFD during resets. > > Suggested-by:Jonathan Kim <jonathan.kim@xxxxxxx> > Signed-off-by: Jesse Zhang <Jesse.Zhang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ > drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index b4fd1e17205e..1c52ff92ea26 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -237,6 +237,8 @@ struct amdgpu_ring_funcs { > void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset); > void (*patch_de)(struct amdgpu_ring *ring, unsigned offset); > int (*reset)(struct amdgpu_ring *ring, unsigned int vmid); > + int (*stop_queue)(struct amdgpu_device *adev, uint32_t instance_id); > + int (*start_queue)(struct amdgpu_device *adev, uint32_t instance_id); > void (*emit_cleaner_shader)(struct amdgpu_ring *ring); > bool (*is_guilty)(struct amdgpu_ring *ring); > }; > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > index fd34dc138081..c1f7ccff9c4e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c > @@ -2132,6 +2132,8 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = { > .emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait, > .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, > .reset = sdma_v4_4_2_reset_queue, > + .stop_queue = sdma_v4_4_2_stop_queue, > + .start_queue = sdma_v4_4_2_restore_queue, > .is_guilty = sdma_v4_4_2_ring_is_guilty, > }; >