> -----Original Message----- > From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> > Sent: Thursday, April 14, 2022 11:42 AM > To: Yat Sin, David <David.YatSin@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count > > > Am 2022-04-14 um 11:08 schrieb Yat Sin, David: > >> -----Original Message----- > >> From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> > >> Sent: Thursday, April 14, 2022 10:52 AM > >> To: Yat Sin, David <David.YatSin@xxxxxxx>; > >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >> Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count > >> > >> > >> Am 2022-04-13 um 11:51 schrieb David Yat Sin: > >>> Queue can be inactive during process termination. This would cause > >>> dqm->gws_queue_count to not be decremented. There can only be 1 > GWS > >>> queue per device process so moving the logic out of loop. > >>> > >>> Signed-off-by: David Yat Sin <david.yatsin@xxxxxxx> > >>> --- > >>> .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 12 ++++++- > ---- > >> - > >>> 1 file changed, 6 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > >>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > >>> index acf4f7975850..7c107b88d944 100644 > >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > >>> @@ -1945,17 +1945,17 @@ static int process_termination_cpsch(struct > >> device_queue_manager *dqm, > >>> else if (q->properties.type == > >> KFD_QUEUE_TYPE_SDMA_XGMI) > >>> deallocate_sdma_queue(dqm, q); > >>> > >>> - if (q->properties.is_active) { > >>> + if (q->properties.is_active) > >>> decrement_queue_count(dqm, q->properties.type); > >>> - if (q->properties.is_gws) { > >>> - dqm->gws_queue_count--; > >>> - qpd->mapped_gws_queue = false; > >>> - } > >>> - } > >> This looks like the original intention was to decrement the counter > >> for inactive GWS queues. This makes sense because this counter is > >> used to determine whether the runlist is oversubscribed. Inactive > >> queues are not in the runlist, so they should not be counted. > >> > >> I see that the counter is updated when queues are activated and > >> deactivated in update_queue. So IMO this patch is both incorrect and > >> unnecessary. Did you see an actual problem with the GWS counter during > process termination? > >> If so, I'd like to know more to understand the root cause. > >> > >> Regards, > >> Felix > > Yes, when using a unit test (for example KFDGWSTest.Semaphore), 1. Put > > a sleep in the middle of the application (after calling > > hsaKmtAllocQueueGWS) 2. Run application and kill the application which it > is in sleep (application never calls queue.Destroy()). > > > > Then inside kernel dqm->gws_queue_count keeps incrementing each time > the experiment is repeated and never goes back to 0. This seems to be > because the sleep causes the process to be evicted and q- > >properties.is_active is false so code does not enter that if statement. > > That's weird. Can you find out why it's not getting there? In the test you > describe, I would expect the queue to be active, so the counter should be > decremented by the existing code. > > Does the test evict the queues for some reason? is_active gets set to false > when a queue is evicted. Yes, the queue is evicted because of the sleep() call in userspace. >Looks like we're missing code to update the > gws_queue_count in evict/restore_process_queues_cpsch (it is present in > evict/restore_process_queues_nocpsch). I think this is the actual problem and the increment/decrement should be done there. I did not realize the dqm->gws_queue_count only counts not-evicted queues. I will post new patch with this change instead. > > Maybe the management of this counter should be moved into > increment/decrement_queue_count, so we don't need to duplicate it in > many places. ACK Regards, David > > Regards, > Felix > > > > > > Regards, > > David > > > >>> dqm->total_queue_count--; > >>> } > >>> > >>> + if (qpd->mapped_gws_queue) { > >>> + dqm->gws_queue_count--; > >>> + qpd->mapped_gws_queue = false; > >>> + } > >>> + > >>> /* Unregister process */ > >>> list_for_each_entry_safe(cur, next_dpn, &dqm->queues, list) { > >>> if (qpd == cur->qpd) {