> -----Original Message----- > From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> > Sent: Thursday, April 14, 2022 10:52 AM > To: Yat Sin, David <David.YatSin@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH 1/2] drm/amdkfd: Fix GWS queue count > > > Am 2022-04-13 um 11:51 schrieb David Yat Sin: > > Queue can be inactive during process termination. This would cause > > dqm->gws_queue_count to not be decremented. There can only be 1 GWS > > queue per device process so moving the logic out of loop. > > > > Signed-off-by: David Yat Sin <david.yatsin@xxxxxxx> > > --- > > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 12 ++++++----- > - > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > > index acf4f7975850..7c107b88d944 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > > @@ -1945,17 +1945,17 @@ static int process_termination_cpsch(struct > device_queue_manager *dqm, > > else if (q->properties.type == > KFD_QUEUE_TYPE_SDMA_XGMI) > > deallocate_sdma_queue(dqm, q); > > > > - if (q->properties.is_active) { > > + if (q->properties.is_active) > > decrement_queue_count(dqm, q->properties.type); > > - if (q->properties.is_gws) { > > - dqm->gws_queue_count--; > > - qpd->mapped_gws_queue = false; > > - } > > - } > > This looks like the original intention was to decrement the counter for inactive > GWS queues. This makes sense because this counter is used to determine > whether the runlist is oversubscribed. Inactive queues are not in the runlist, > so they should not be counted. > > I see that the counter is updated when queues are activated and deactivated > in update_queue. So IMO this patch is both incorrect and unnecessary. Did > you see an actual problem with the GWS counter during process termination? > If so, I'd like to know more to understand the root cause. > > Regards, > Felix Yes, when using a unit test (for example KFDGWSTest.Semaphore), 1. Put a sleep in the middle of the application (after calling hsaKmtAllocQueueGWS) 2. Run application and kill the application which it is in sleep (application never calls queue.Destroy()). Then inside kernel dqm->gws_queue_count keeps incrementing each time the experiment is repeated and never goes back to 0. This seems to be because the sleep causes the process to be evicted and q->properties.is_active is false so code does not enter that if statement. Regards, David > > > > > > dqm->total_queue_count--; > > } > > > > + if (qpd->mapped_gws_queue) { > > + dqm->gws_queue_count--; > > + qpd->mapped_gws_queue = false; > > + } > > + > > /* Unregister process */ > > list_for_each_entry_safe(cur, next_dpn, &dqm->queues, list) { > > if (qpd == cur->qpd) {