Re: [PATCH 2/2] drm/scheduler: stop setting rq to NULL

Christian König <christian.koenig@xxxxxxx> · Fri, 3 Aug 2018 16:06:40 +0200



    Am 02.08.2018 um 16:11 schrieb Andrey
      Grodzovsky:

    
      On 08/02/2018 02:47 AM, Nayan
        Deshmukh wrote:

      
            On Thu, Aug 2, 2018 at 12:12 PM Christian
              König <ckoenig.leichtzumerken@xxxxxxxxx>
              wrote:

            
            Am
              02.08.2018 um 00:25 schrieb Andrey Grodzovsky:

              > Thinking again about this change and 53d5f1e
              drm/scheduler: move idle 

              > entities to scheduler with less load v2

              >

              > Looks to me like use case for which fc9a539
              drm/scheduler: add NULL 

              > pointer check for run queue (v2) was done

              >

              > will not work anymore.

              >

              > First of all in drm_sched_entity_push_job, 'if
              (!entity->rq)' will 

              > never be true any more since we stopped entity->rq
              to NULL in

              >

              > drm_sched_entity_flush.

              
              Good point, going to remove that.

            
      So basically we don't need that if (...){ return; } in
      drm_sched_entity_push_job any more ?

    
    Yes, exactly.

    
              >

              > Second of all, even after we removed the entity from
              rq in 

              > drm_sched_entity_flush to terminate any subsequent
              submissions

              >

              > to the queue the other thread doing push job can just
              acquire again a 

              > run queue

              >

              > from drm_sched_entity_get_free_sched and continue
              submission

              
            Hi Christian 

            
             That is
              actually desired.

              
              When another process is now using the entity to submit
              jobs adding it 

              back to the rq is actually the right thing to do cause the
              entity is 

              still in use.

            
      Yes, no problem if it's another process. But what about another
      thread from same process ? Is it a possible use case that 2
      threads from same process submit to same entity concurrently ? If
      so and we specifically kill one, the other will not stop event if
      we want him to because current code makes him just require a rq
      for him self.

    
    Well you can't kill a single thread of a process (you can only
    interrupt it), a SIGKILL will always kill the whole process.

    
            I am not aware of the usecase so I might just be
              rambling. but if the thread/process that created the
              entity has called drm_sched_entity_flush then we shouldn't
              allow other processes to push jobs to that entity. 

              
            Nayan  

            
              Christian.

            
      We don't really know who created the entity, the creation could be
      by X itself and then it's passed to other process for use. Check
      'drm/scheduler: only kill entity if last user is killed v2', the
      idea is that if by the time you want to

      kill this entity another process (not another thread from your
      process - i.e. current->group_leader != last_user in
      drm_sched_entity_flush) already started to use this entity just
      let it be.

    
    Yes, exactly that's the idea.

    
    Christian.

    
      Andrey

      
              > so you can't substitute ' if (!entity->rq)' to 'if
              

              > (list_empty(&entity->list))'.

              >

              > What about adding a 'stopped' flag to
              drm_sched_entity to be set in 

              > drm_sched_entity_flush and in

              >

              > drm_sched_entity_push_job check for  'if
              (!entity->stopped)' instead 

              > of  ' if (!entity->rq)' ?

              >

              > Andrey

              >

              >

              > On 07/30/2018 07:03 AM, Christian König wrote:

              >> We removed the redundancy of having an extra
              scheduler field, so we

              >> can't set the rq to NULL any more or otherwise
              won't know which

              >> scheduler to use for the cleanup.

              >>

              >> Just remove the entity from the scheduling list
              instead.

              >>

              >> Signed-off-by: Christian König <christian.koenig@xxxxxxx>

              >> ---

              >>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 35
              

              >> +++++++------------------------

              >>   1 file changed, 8 insertions(+), 27
              deletions(-)

              >>

              >> diff --git
              a/drivers/gpu/drm/scheduler/gpu_scheduler.c 

              >> b/drivers/gpu/drm/scheduler/gpu_scheduler.c

              >> index f563e4fbb4b6..1b733229201e 100644

              >> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c

              >> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c

              >> @@ -198,21 +198,6 @@ int
              drm_sched_entity_init(struct 

              >> drm_sched_entity *entity,

              >>   }

              >>   EXPORT_SYMBOL(drm_sched_entity_init);

              >>   -/**

              >> - * drm_sched_entity_is_initialized - Query if
              entity is initialized

              >> - *

              >> - * @sched: Pointer to scheduler instance

              >> - * @entity: The pointer to a valid scheduler
              entity

              >> - *

              >> - * return true if entity is initialized, false
              otherwise

              >> -*/

              >> -static bool
              drm_sched_entity_is_initialized(struct drm_gpu_scheduler 

              >> *sched,

              >> -                        struct drm_sched_entity
              *entity)

              >> -{

              >> -    return entity->rq != NULL &&

              >> -        entity->rq->sched == sched;

              >> -}

              >> -

              >>   /**

              >>    * drm_sched_entity_is_idle - Check if entity
              is idle

              >>    *

              >> @@ -224,7 +209,8 @@ static bool
              drm_sched_entity_is_idle(struct 

              >> drm_sched_entity *entity)

              >>   {

              >>       rmb();

              >>   -    if (!entity->rq ||
              spsc_queue_peek(&entity->job_queue) == NULL)

              >> +    if (list_empty(&entity->list) ||

              >> +       
              spsc_queue_peek(&entity->job_queue) == NULL)

              >>           return true;

              >>         return false;

              >> @@ -279,8 +265,6 @@ long
              drm_sched_entity_flush(struct 

              >> drm_sched_entity *entity, long timeout)

              >>       long ret = timeout;

              >>         sched = entity->rq->sched;

              >> -    if (!drm_sched_entity_is_initialized(sched,
              entity))

              >> -        return ret;

              >>       /**

              >>        * The client will not queue more IBs
              during this fini, consume 

              >> existing

              >>        * queued IBs or discard them on SIGKILL

              >> @@ -299,7 +283,7 @@ long
              drm_sched_entity_flush(struct 

              >> drm_sched_entity *entity, long timeout)

              >>       last_user =
              cmpxchg(&entity->last_user,
              current->group_leader, 

              >> NULL);

              >>       if ((!last_user || last_user ==
              current->group_leader) &&

              >>           (current->flags & PF_EXITING)
              && (current->exit_code == 

              >> SIGKILL))

              >> -        drm_sched_entity_set_rq(entity, NULL);

              >> +       
              drm_sched_rq_remove_entity(entity->rq, entity);

              >>         return ret;

              >>   }

              >> @@ -320,7 +304,7 @@ void
              drm_sched_entity_fini(struct 

              >> drm_sched_entity *entity)

              >>       struct drm_gpu_scheduler *sched;

              >>         sched = entity->rq->sched;

              >> -    drm_sched_entity_set_rq(entity, NULL);

              >> +    drm_sched_rq_remove_entity(entity->rq,
              entity);

              >>         /* Consumption of existing IBs wasn't
              completed. Forcefully

              >>        * remove them here.

              >> @@ -416,15 +400,12 @@ void
              drm_sched_entity_set_rq(struct 

              >> drm_sched_entity *entity,

              >>       if (entity->rq == rq)

              >>           return;

              >>   -    spin_lock(&entity->rq_lock);

              >> -

              >> -    if (entity->rq)

              >> -       
              drm_sched_rq_remove_entity(entity->rq, entity);

              >> +    BUG_ON(!rq);

              >>   +    spin_lock(&entity->rq_lock);

              >> +    drm_sched_rq_remove_entity(entity->rq,
              entity);

              >>       entity->rq = rq;

              >> -    if (rq)

              >> -        drm_sched_rq_add_entity(rq, entity);

              >> -

              >> +    drm_sched_rq_add_entity(rq, entity);

              >>       spin_unlock(&entity->rq_lock);

              >>   }

              >>   EXPORT_SYMBOL(drm_sched_entity_set_rq);

              >

              
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel