On 08/14/2018 11:26 AM, Christian König
wrote:
Am 14.08.2018 um 17:17 schrieb Andrey
Grodzovsky:
I assume that this is the only code change and no locks are
taken in drm_sched_entity_push_job -
What are you talking about? You surely now take looks in
drm_sched_entity_push_job():
+ spin_lock(&entity->rq_lock);
+ entity->last_user = current->group_leader;
+ if (list_empty(&entity->list))
Oh, so your code in drm_sched_entity_flush still relies on my code
in drm_sched_entity_push_job, OK.
What happens if process A runs drm_sched_entity_push_job
after this code was executed from the (dying) process B and
there
are still jobs in the queue (the wait_event terminated
prematurely), the entity already removed from rq , but bool
'first' in drm_sched_entity_push_job
will return false and so the entity will not be reinserted
back into rq entity list and no wake up trigger will happen
for process A pushing a new job.
Thought about this as well, but in this case I would say: Shit
happens!
The dying process did some command submission and because of this
the entity was killed as well when the process died and that is
legitimate.
Another issue bellow -
Andrey
On 08/14/2018 03:05 AM, Christian
König wrote:
I would rather like to avoid
taking the lock in the hot path.
How about this:
/* For killed process disable any more IBs enqueue
right now */
last_user = cmpxchg(&entity->last_user,
current->group_leader, NULL);
if ((!last_user || last_user ==
current->group_leader) &&
(current->flags & PF_EXITING) &&
(current->exit_code == SIGKILL)) {
grab_lock();
drm_sched_rq_remove_entity(entity->rq, entity);
if (READ_ONCE(&entity->last_user) != NULL)
This condition is true because just exactly now process A did
drm_sched_entity_push_job->WRITE_ONCE(entity->last_user,
current->group_leader);
and so the line bellow executed and entity reinserted into rq.
Let's say also that the entity job queue is empty now. For
process A bool 'first' will be true
and hence also
drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq,
entity) will take place causing double insertion of the entity
queue into rq list.
Calling drm_sched_rq_add_entity() is harmless, it is protected
against double insertion.
Missed that one, right...
But thinking more about it your idea of adding a killed or
finished flag becomes more and more appealing to have a consistent
handling here.
Christian.
So to be clear - you would like something like
Removing entity->last_user and adding a 'stopped' flag to
drm_sched_entity to be set in drm_sched_entity_flush and in
drm_sched_entity_push_job check for 'if (entity->stopped)' and
when true just return some error back to user instead of pushing the
job ?
Andrey
Andrey
drm_sched_rq_add_entity(entity->rq, entity);
drop_lock();
}
Christian.
Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:
Attached.
If the general idea in the patch is OK I can think of a
test (and maybe add to libdrm amdgpu tests) to actually
simulate this scenario with 2 forked
concurrent processes working on same entity's job queue
when one is dying while the other keeps pushing to the
same queue. For now I only tested it
with normal boot and ruining multiple glxgears
concurrently - which doesn't really test this code path
since i think each of them works on it's own FD.
Andrey
On 08/10/2018 09:27 AM,
Christian König wrote:
Crap, yeah indeed that needs
to be protected by some lock.
Going to prepare a patch for that,
Christian.
Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
But I still have questions about
entity->last_user (didn't notice this before) -
Looks to me there is a race condition with it's
current usage, let's say process A was preempted after
doing drm_sched_entity_flush->cmpxchg(...)
now process B working on same entity (forked) is
inside drm_sched_entity_push_job, he writes his PID to
entity->last_user and also
executes drm_sched_rq_add_entity. Now process A runs
again and execute drm_sched_rq_remove_entity
inadvertently causing process B removal
from it's scheduler rq.
Looks to me like instead we should lock together
entity->last_user accesses and adds/removals of
entity to the rq.
Andrey
On 08/06/2018 10:18 AM,
Nayan Deshmukh wrote:
I forgot about this since we started
discussing possible scenarios of processes and
threads.
In any case, this check is redundant. Acked-by:
Nayan Deshmukh < nayan26deshmukh@xxxxxxxxx>
Nayan
Ping.
Any objections to that?
Christian.
Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> ---
> drivers/gpu/drm/scheduler/gpu_scheduler.c |
5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git
a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> ---
a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void
drm_sched_entity_push_job(struct drm_sched_job
*sched_job,
> if (first) {
> /* Add the entity to the run
queue */
>
spin_lock(&entity->rq_lock);
> - if (!entity->rq) {
> - DRM_ERROR("Trying to
push to a killed entity\n");
> -
spin_unlock(&entity->rq_lock);
> - return;
> - }
>
drm_sched_rq_add_entity(entity->rq, entity);
>
spin_unlock(&entity->rq_lock);
>
drm_sched_wakeup(entity->rq->sched);
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel
|