Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 14 Aug 2018 09:05:19 +0200



    I would rather like to avoid taking the
      lock in the hot path.

      
      How about this:

      
           /* For killed process disable any more IBs enqueue right now
      */

          last_user = cmpxchg(&entity->last_user,
      current->group_leader, NULL);

           if ((!last_user || last_user == current->group_leader)
      &&

               (current->flags & PF_EXITING) &&
      (current->exit_code == SIGKILL)) {

              grab_lock();

               drm_sched_rq_remove_entity(entity->rq, entity);

              if (READ_ONCE(&entity->last_user) != NULL)

                  drm_sched_rq_add_entity(entity->rq, entity);

              drop_lock();

          }

       
      Christian.

      
      Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

    
      Attached. 
      If the general idea in the patch is OK I can think of a test
        (and maybe add to libdrm amdgpu tests) to actually simulate this
        scenario with 2 forked
      concurrent processes working on same entity's job queue when
        one is dying while the other keeps pushing to the same queue.
        For now I only tested it
      with normal boot and ruining multiple glxgears concurrently -
        which doesn't really test this code path since i think each of
        them works on it's own FD.

      
      Andrey

      
      On 08/10/2018 09:27 AM, Christian
        König wrote:

      
        Crap, yeah indeed that needs to be
          protected by some lock.

          
          Going to prepare a patch for that,

          Christian.

          
          Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

        
          Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
          

          But I still  have questions about entity->last_user
            (didn't notice this before) - 

          
          Looks to me there is a race condition with it's current
            usage, let's say process A was preempted after doing
            drm_sched_entity_flush->cmpxchg(...)
          now process B working on same entity (forked) is inside
            drm_sched_entity_push_job, he writes his PID to
            entity->last_user and also
          executes drm_sched_rq_add_entity. Now process A runs again
            and execute drm_sched_rq_remove_entity inadvertently causing
            process B removal
          from it's scheduler rq.
          Looks to me like instead we should lock together
            entity->last_user accesses and adds/removals of entity to
            the rq.
          Andrey

          
          On 08/06/2018 10:18 AM, Nayan
            Deshmukh wrote:

          
                I forgot about this since we started discussing
                  possible scenarios of processes and threads.

                  
                In any case, this check is redundant. Acked-by: Nayan
                Deshmukh <nayan26deshmukh@xxxxxxxxx>

                
              Nayan

            
              On Mon, Aug 6, 2018 at 7:43 PM Christian
                König <ckoenig.leichtzumerken@xxxxxxxxx>
                wrote:

              
              Ping.
                Any objections to that?

                
                Christian.

                
                Am 03.08.2018 um 13:08 schrieb Christian König:

                > That is superflous now.

                >

                > Signed-off-by: Christian König <christian.koenig@xxxxxxx>

                > ---

                >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5
                -----

                >   1 file changed, 5 deletions(-)

                >

                > diff --git
                a/drivers/gpu/drm/scheduler/gpu_scheduler.c
                b/drivers/gpu/drm/scheduler/gpu_scheduler.c

                > index 85908c7f913e..65078dd3c82c 100644

                > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c

                > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c

                > @@ -590,11 +590,6 @@ void
                drm_sched_entity_push_job(struct drm_sched_job
                *sched_job,

                >       if (first) {

                >               /* Add the entity to the run queue */

                >               spin_lock(&entity->rq_lock);

                > -             if (!entity->rq) {

                > -                     DRM_ERROR("Trying to push to
                a killed entity\n");

                > -                   
                 spin_unlock(&entity->rq_lock);

                > -                     return;

                > -             }

                >             
                 drm_sched_rq_add_entity(entity->rq, entity);

                >               spin_unlock(&entity->rq_lock);

                >             
                 drm_sched_wakeup(entity->rq->sched);

                
      _______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

    
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel