RE: [PATCH 1/2] drm/amdkfd: Fix GWS queue count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
> Sent: Thursday, April 14, 2022 11:42 AM
> To: Yat Sin, David <David.YatSin@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count
> 
> 
> Am 2022-04-14 um 11:08 schrieb Yat Sin, David:
> >> -----Original Message-----
> >> From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
> >> Sent: Thursday, April 14, 2022 10:52 AM
> >> To: Yat Sin, David <David.YatSin@xxxxxxx>;
> >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count
> >>
> >>
> >> Am 2022-04-13 um 11:51 schrieb David Yat Sin:
> >>> Queue can be inactive during process termination. This would cause
> >>> dqm->gws_queue_count to not be decremented. There can only be 1
> GWS
> >>> queue per device process so moving the logic out of loop.
> >>>
> >>> Signed-off-by: David Yat Sin <david.yatsin@xxxxxxx>
> >>> ---
> >>>    .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c    | 12 ++++++-
> ----
> >> -
> >>>    1 file changed, 6 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> >>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> >>> index acf4f7975850..7c107b88d944 100644
> >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> >>> @@ -1945,17 +1945,17 @@ static int process_termination_cpsch(struct
> >> device_queue_manager *dqm,
> >>>    		else if (q->properties.type ==
> >> KFD_QUEUE_TYPE_SDMA_XGMI)
> >>>    			deallocate_sdma_queue(dqm, q);
> >>>
> >>> -		if (q->properties.is_active) {
> >>> +		if (q->properties.is_active)
> >>>    			decrement_queue_count(dqm, q->properties.type);
> >>> -			if (q->properties.is_gws) {
> >>> -				dqm->gws_queue_count--;
> >>> -				qpd->mapped_gws_queue = false;
> >>> -			}
> >>> -		}
> >> This looks like the original intention was to decrement the counter
> >> for inactive GWS queues. This makes sense because this counter is
> >> used to determine whether the runlist is oversubscribed. Inactive
> >> queues are not in the runlist, so they should not be counted.
> >>
> >> I see that the counter is updated when queues are activated and
> >> deactivated in update_queue. So IMO this patch is both incorrect and
> >> unnecessary. Did you see an actual problem with the GWS counter during
> process termination?
> >> If so, I'd like to know more to understand the root cause.
> >>
> >> Regards,
> >>     Felix
> > Yes, when using a unit test (for example KFDGWSTest.Semaphore), 1. Put
> > a sleep in the middle of the application (after calling
> > hsaKmtAllocQueueGWS) 2. Run application and kill the application which it
> is in sleep (application never calls queue.Destroy()).
> >
> > Then inside kernel dqm->gws_queue_count keeps incrementing each time
> the experiment is repeated and never goes back to 0. This seems to be
> because the sleep causes the process to be evicted and q-
> >properties.is_active is false so code does not enter that if statement.
> 
> That's weird. Can you find out why it's not getting there? In the test you
> describe, I would expect the queue to be active, so the counter should be
> decremented by the existing code.
> 
> Does the test evict the queues for some reason? is_active gets set to false
> when a queue is evicted. 
Yes, the queue is evicted because of the sleep() call in userspace.

>Looks like we're missing code to update the
> gws_queue_count in evict/restore_process_queues_cpsch (it is present in
> evict/restore_process_queues_nocpsch).

I think this is the actual problem and the increment/decrement should be done there. I did not realize the dqm->gws_queue_count only counts not-evicted queues. I will post new patch with this change instead.

> 
> Maybe the management of this counter should be moved into
> increment/decrement_queue_count, so we don't need to duplicate it in
> many places.
ACK

Regards,
David

> 
> Regards,
>    Felix
> 
> 
> >
> > Regards,
> > David
> >
> >>>    		dqm->total_queue_count--;
> >>>    	}
> >>>
> >>> +	if (qpd->mapped_gws_queue) {
> >>> +		dqm->gws_queue_count--;
> >>> +		qpd->mapped_gws_queue = false;
> >>> +	}
> >>> +
> >>>    	/* Unregister process */
> >>>    	list_for_each_entry_safe(cur, next_dpn, &dqm->queues, list) {
> >>>    		if (qpd == cur->qpd) {




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux