Re: [PATCH 2/2] drm/scheduler: stop setting rq to NULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Fri, Aug 3, 2018 at 7:17 PM Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> wrote:



On 08/03/2018 08:12 AM, Nayan Deshmukh wrote:


On Thu, Aug 2, 2018, 8:09 PM Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> wrote:



On 08/02/2018 02:43 AM, Nayan Deshmukh wrote:
Hi Andrey,

Good catch. I guess since both Christian and I were working on it parallelly we didn't catch this problem.

If it is only causing a problem in push_job then we can handle it something like this:

----------------------------------------------------------------------------------------------------------------------------
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 05dc6ecd4003..f344ce32f128 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -563,6 +563,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
        first = spsc_queue_count(&entity->job_queue) == 0;
        reschedule = idle && first && (entity->num_rq_list > 1);
 
+       if (first && list_empty(&entity->list)) {

first might be false if the other process interrupted by SIGKILL  in middle of  wait_event_killable in drm_sched_entity_flush when there were still item in queue.
I don't see a good way besides adding a 'stopped' flag to drm_sched_enitity.
Sorry I missed this mail. This case might happen but this was also not being handled previously. 

Nayan

The original code before  'drm/scheduler: stop setting rq to NULL' was , because you didn't use the queue's emptiness as a criteria for aborting your next push to queue but rather
checking for entity->rq != NULL.
But (entity->rq != NULL) check was inside the body of if (first) so it was also dependent on the queue's emptiness earlier. Well in any case it doesn't matter now as with Christian's patch we can remove the if condition altogether.  

Nayan

Andrey


Andrey

+               DRM_ERROR("Trying to push to a killed entity\n");
+               return;
+       }
+
        if (reschedule) {
                rq = drm_sched_entity_get_free_sched(entity);
                spin_lock(&entity->rq_lock);
@@ -580,11 +585,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
        if (first) {
                /* Add the entity to the run queue */
                spin_lock(&entity->rq_lock);
-               if (!entity->rq) {
-                       DRM_ERROR("Trying to push to a killed entity\n");
-                       spin_unlock(&entity->rq_lock);
-                       return;
-               }
                drm_sched_rq_add_entity(entity->rq, entity);
                spin_unlock(&entity->rq_lock);
                drm_sched_wakeup(entity->rq->sched);
-----------------------------------------------------------------------------------------------------------------------------


On Thu, Aug 2, 2018 at 3:55 AM Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> wrote:
Thinking again about this change and 53d5f1e drm/scheduler: move idle
entities to scheduler with less load v2

Looks to me like use case for which fc9a539 drm/scheduler: add NULL
pointer check for run queue (v2) was done

will not work anymore.

First of all in drm_sched_entity_push_job, 'if (!entity->rq)' will never
be true any more since we stopped entity->rq to NULL in

drm_sched_entity_flush.

Second of all, even after we removed the entity from rq in
drm_sched_entity_flush to terminate any subsequent submissions

to the queue the other thread doing push job can just acquire again a
run queue

from drm_sched_entity_get_free_sched and continue submission so you
can't substitute ' if (!entity->rq)' to 'if (list_empty(&entity->list))'.

What about adding a 'stopped' flag to drm_sched_entity to be set in
drm_sched_entity_flush and in

But it might be worth adding a stopped flag field if it is required at other places as well. 

Thanks,
Nayan
drm_sched_entity_push_job check for  'if (!entity->stopped)' instead of 
' if (!entity->rq)' ?

Andrey


On 07/30/2018 07:03 AM, Christian König wrote:
> We removed the redundancy of having an extra scheduler field, so we
> can't set the rq to NULL any more or otherwise won't know which
> scheduler to use for the cleanup.
>
> Just remove the entity from the scheduling list instead.
>
> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 35 +++++++------------------------
>   1 file changed, 8 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index f563e4fbb4b6..1b733229201e 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -198,21 +198,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   }
>   EXPORT_SYMBOL(drm_sched_entity_init);
>   
> -/**
> - * drm_sched_entity_is_initialized - Query if entity is initialized
> - *
> - * @sched: Pointer to scheduler instance
> - * @entity: The pointer to a valid scheduler entity
> - *
> - * return true if entity is initialized, false otherwise
> -*/
> -static bool drm_sched_entity_is_initialized(struct drm_gpu_scheduler *sched,
> -                                         struct drm_sched_entity *entity)
> -{
> -     return entity->rq != NULL &&
> -             entity->rq->sched == sched;
> -}
> -
>   /**
>    * drm_sched_entity_is_idle - Check if entity is idle
>    *
> @@ -224,7 +209,8 @@ static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity)
>   {
>       rmb();
>   
> -     if (!entity->rq || spsc_queue_peek(&entity->job_queue) == NULL)
> +     if (list_empty(&entity->list) ||
> +         spsc_queue_peek(&entity->job_queue) == NULL)
>               return true;
>   
>       return false;
> @@ -279,8 +265,6 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>       long ret = timeout;
>   
>       sched = entity->rq->sched;
> -     if (!drm_sched_entity_is_initialized(sched, entity))
> -             return ret;
>       /**
>        * The client will not queue more IBs during this fini, consume existing
>        * queued IBs or discard them on SIGKILL
> @@ -299,7 +283,7 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
>       last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>       if ((!last_user || last_user == current->group_leader) &&
>           (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
> -             drm_sched_entity_set_rq(entity, NULL);
> +             drm_sched_rq_remove_entity(entity->rq, entity);
>   
>       return ret;
>   }
> @@ -320,7 +304,7 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity)
>       struct drm_gpu_scheduler *sched;
>   
>       sched = entity->rq->sched;
> -     drm_sched_entity_set_rq(entity, NULL);
> +     drm_sched_rq_remove_entity(entity->rq, entity);
>   
>       /* Consumption of existing IBs wasn't completed. Forcefully
>        * remove them here.
> @@ -416,15 +400,12 @@ void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
>       if (entity->rq == rq)
>               return;
>   
> -     spin_lock(&entity->rq_lock);
> -
> -     if (entity->rq)
> -             drm_sched_rq_remove_entity(entity->rq, entity);
> +     BUG_ON(!rq);
>   
> +     spin_lock(&entity->rq_lock);
> +     drm_sched_rq_remove_entity(entity->rq, entity);
>       entity->rq = rq;
> -     if (rq)
> -             drm_sched_rq_add_entity(rq, entity);
> -
> +     drm_sched_rq_add_entity(rq, entity);
>       spin_unlock(&entity->rq_lock);
>   }
>   EXPORT_SYMBOL(drm_sched_entity_set_rq);



_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux