On Wed, Nov 27, 2024 at 2:57 PM Srinivasan Shanmugam <srinivasan.shanmugam@xxxxxxx> wrote: > > This update adds explanations to key functions that manage how the > Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the > GPU. > > amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period > for KFD to ensure it takes turns with KGD in using the GPU. It uses a > mutex to safely manage shared data, like timing and state, and tracks > when KFD starts and stops waiting. > > amdgpu_gfx_enforce_isolation_ring_begin_use: Ensures KFD has enough time > to run before new tasks are submitted to the GPU ring. It uses a mutex > to synchronize access and may adjust the KFD scheduler. > > amdgpu_gfx_enforce_isolation_ring_end_use: Handles cleanup and state > updates when finishing the use of a GPU ring. It may also adjust the KFD > scheduler, using a mutex to manage shared data access. > > Cc: Christian König <christian.koenig@xxxxxxx> > Cc: Alex Deucher <alexander.deucher@xxxxxxx> > Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 31 +++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > index e54f42e3797e..ce9ecd1fe748 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > @@ -1940,6 +1940,19 @@ void amdgpu_gfx_enforce_isolation_handler(struct work_struct *work) > mutex_unlock(&adev->enforce_isolation_mutex); > } > > +/** > + * amdgpu_gfx_enforce_isolation_wait_for_kfd - Manage KFD wait period for process isolation > + * @adev: amdgpu_device pointer > + * @idx: Index of the GPU partition > + * > + * This function controls how the Kernel Fusion Driver (KFD) waits so that both > + * the KFD and the Kernel Graphics Driver (KGD) use the GPU one after the other. > + * It decides if the KFD should pause to let the KGD use the GPU. > + * A mutex is used to ensure that shared information, like timing and state, > + * is accessed safely by both drivers. > + * The function also records when the KFD's wait period starts and ends, > + * to ensure the time-sharing works correctly. maybe add something like: When kernel submissions come in, the jobs are given a time slice and once that time slice is up, if there are KFD user queues active, kernel submissions are blocked until KFD has had its time slice. Once the KFD time slice is up, KFD user queues are preempted and kernel submissions are unblocked and allowed to run again. > + */ > static void > amdgpu_gfx_enforce_isolation_wait_for_kfd(struct amdgpu_device *adev, > u32 idx) > @@ -1985,6 +1998,15 @@ amdgpu_gfx_enforce_isolation_wait_for_kfd(struct amdgpu_device *adev, > msleep(GFX_SLICE_PERIOD_MS); > } > > +/** > + * amdgpu_gfx_enforce_isolation_ring_begin_use - Begin use of a ring with enforced isolation > + * @ring: Pointer to the amdgpu_ring structure > + * > + * This function is called when beginning the use of a GPU ring with enforced isolation. > + * It ensures that the KFD has had sufficient time to run before allowing more work to > + * be submitted to the ring. The function acquires a mutex to synchronize access and > + * may control the KFD scheduler to maintain process isolation. I would say something like: Ring begin_use helper implementation for gfx which serializes access to the gfx IP between kernel submission IOCTLs and KFD user queues when isolation enforcement is enabled. The kernel submission IOCTLs and KFD user queues each get a time slice when both are active. > + */ > void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring) > { > struct amdgpu_device *adev = ring->adev; > @@ -2012,6 +2034,15 @@ void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring) > mutex_unlock(&adev->enforce_isolation_mutex); > } > > +/** > + * amdgpu_gfx_enforce_isolation_ring_end_use - End use of a ring with enforced isolation > + * @ring: Pointer to the amdgpu_ring structure > + * > + * This function is called when ending the use of a GPU ring with enforced isolation. > + * It ensures that any necessary cleanup or state updates are performed, and it may > + * control the KFD scheduler to maintain process isolation. The function uses a mutex > + * to synchronize access to shared data. I'd say something like: Ring end_use helper implementation for gfx which serializes access to the gfx IP between kernel submission IOCTLs and KFD user queues when isolation enforcement is enabled. The kernel submission IOCTLs and KFD user queues each get a time slice when both are active. > + */ > void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring) > { > struct amdgpu_device *adev = ring->adev; > -- > 2.34.1 >