Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 14.08.2018 um 17:17 schrieb Andrey Grodzovsky:

I assume that this is the only code change and no locks are taken in drm_sched_entity_push_job -


What are you talking about? You surely now take looks in drm_sched_entity_push_job():
+    spin_lock(&entity->rq_lock);
+    entity->last_user = current->group_leader;
+    if (list_empty(&entity->list))

What happens if process A runs drm_sched_entity_push_job after this code was executed from the  (dying) process B and there

are still jobs in the queue (the wait_event terminated prematurely), the entity already removed from rq , but bool 'first' in drm_sched_entity_push_job

will return false and so the entity will not be reinserted back into rq entity list and no wake up trigger will happen for process A pushing a new job.


Thought about this as well, but in this case I would say: Shit happens!

The dying process did some command submission and because of this the entity was killed as well when the process died and that is legitimate.


Another issue bellow -

Andrey


On 08/14/2018 03:05 AM, Christian König wrote:
I would rather like to avoid taking the lock in the hot path.

How about this:

     /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
     if ((!last_user || last_user == current->group_leader) &&
         (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) {
        grab_lock();
         drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(&entity->last_user) != NULL)

This condition is true because just exactly now process A did drm_sched_entity_push_job->WRITE_ONCE(entity->last_user, current->group_leader);
and so the line bellow executed and entity reinserted into rq. Let's say also that the entity job queue is empty now. For process A bool 'first' will be true
and hence also drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq, entity) will take place causing double insertion of the entity queue into rq list.

Calling drm_sched_rq_add_entity() is harmless, it is protected against double insertion.

But thinking more about it your idea of adding a killed or finished flag becomes more and more appealing to have a consistent handling here.

Christian.


Andrey

            drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }
 
Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

Attached. 

If the general idea in the patch is OK I can think of a test (and maybe add to libdrm amdgpu tests) to actually simulate this scenario with 2 forked

concurrent processes working on same entity's job queue when one is dying while the other keeps pushing to the same queue. For now I only tested it

with normal boot and ruining multiple glxgears concurrently - which doesn't really test this code path since i think each of them works on it's own FD.

Andrey


On 08/10/2018 09:27 AM, Christian König wrote:
Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>


But I still  have questions about entity->last_user (didn't notice this before) -

Looks to me there is a race condition with it's current usage, let's say process A was preempted after doing drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside drm_sched_entity_push_job, he writes his PID to entity->last_user and also

executes drm_sched_rq_add_entity. Now process A runs again and execute drm_sched_rq_remove_entity inadvertently causing process B removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user accesses and adds/removals of entity to the rq.

Andrey


On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh <nayan26deshmukh@xxxxxxxxx>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(&entity->rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed entity\n");
> -                     spin_unlock(&entity->rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(&entity->rq_lock);
>               drm_sched_wakeup(entity->rq->sched);






_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux