[PATCH 2/2] drm/amd/sched: add schuduling policy

andresx7@xxxxxxxxx (Andres Rodriguez) · Thu, 16 Mar 2017 11:31:39 -0400

On 2017-03-16 11:13 AM, Andres Rodriguez wrote:
>
>
> On 2017-03-16 05:15 AM, zhoucm1 wrote:
>>
>>
>> On 2017å¹´03æ??16æ?¥ 17:10, Christian KÃ¶nig wrote:
>>> Am 16.03.2017 um 10:00 schrieb Chunming Zhou:
>>>> if high priority rq is full, then process with low priority could be
>>>> starve.
>>>> Add policy for this problem, the high proiority can ahead of next
>>>> priority queue,
>>>> the ratio is 2 : 1.
>>>>
>>>> Change-Id: I58f4a6b9cdce8689b18dd8e83dd6e2cf5f99d5fb
>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>
>>> Well, the idea behind the high priority queues is to actually starve
>>> the low priority queues to a certain amount.
>>>
>>> At least for the kernel queue that is really desired.
>> Yes, agree, but we certainly don't want  low priority queue is totally
>> dead, which doesn't have chance to run if high priority queue is always
>> full and busy.
>> If without Andres changes, it doesn't matter. But after Andres changes
>> upstream, we need scheduling policy.
>>
>> Regards,
>> David Zhou
>
> Hey David,
>
> The desired behavior is actually starvation. This is why allocating a
> high priority context is locked behind root privileges at the moment.
>
> High priority work includes many different features intended for the
> safety of the user wearing the headset. These include:
>     - stopping the user from tripping over furniture
>     - preventing motion sickness
>
> We cannot compromise these safety features in order to run non-critical
> tasks.
>
> The current approach also has the disadvantage of heavily interleaving
> work. This is going to have two undesired side effects. First, there
> will be a performance overhead from having the queue context switch so
> often.
>
> Second, this approach improves concurrency of work in a ring with mixed
> priority work, but actually decreases overall system concurrency. When a
> ring's HW queue has any high priority work committed, the whole ring
> will be executing at high priority. So interleaving the work will result
> in extending the time a ring spends in high priority mode. This
> effectively extends time that a ring might spend with CU's reserved
> which will have a performance impact on other rings.
>
> The best approach here is to make sure we don't map high priority work
> and regular priority work to the same ring. This is why the LRU policy
> for userspace ring ids to kernel ring ids was introduced. This policy
> provides the guarantee as a side effect.

Wanted to correct myself here. The LRU policy doesn't actually provide a 
guarantee. It approximates it.

Regards,
Andres

> But further work there could be
> useful to actually check context priorities when doing the ring mapping.
>
> By placing work on different rings, we let the hardware actually
> dispatch work according to wave parameters like VGPR usage, etc. Trying
> to deduce all this information in the GPU scheduler will get very
> complicated.
>
> This is NAK'd by me.
>
> Regards,
> Andres
>
>
>>>
>>> Christian.
>>>
>>>> ---
>>>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 26
>>>> +++++++++++++++++++++++---
>>>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  2 ++
>>>>   2 files changed, 25 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> index 0f439dd..4637b6f 100644
>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
>>>> @@ -35,11 +35,16 @@
>>>>   static void amd_sched_process_job(struct fence *f, struct fence_cb
>>>> *cb);
>>>>     /* Initialize a given run queue struct */
>>>> -static void amd_sched_rq_init(struct amd_sched_rq *rq)
>>>> +static void amd_sched_rq_init(struct amd_gpu_scheduler *sched, enum
>>>> +                  amd_sched_priority pri)
>>>>   {
>>>> +    struct amd_sched_rq *rq = &sched->sched_rq[pri];
>>>> +
>>>>       spin_lock_init(&rq->lock);
>>>>       INIT_LIST_HEAD(&rq->entities);
>>>>       rq->current_entity = NULL;
>>>> +    rq->wait_base = pri * 2;
>>>> +    rq->wait = rq->wait_base;
>>>>   }
>>>>     static void amd_sched_rq_add_entity(struct amd_sched_rq *rq,
>>>> @@ -494,17 +499,32 @@ static void amd_sched_wakeup(struct
>>>> amd_gpu_scheduler *sched)
>>>>   {
>>>>       struct amd_sched_entity *entity;
>>>>       int i;
>>>> +    bool skip;
>>>>         if (!amd_sched_ready(sched))
>>>>           return NULL;
>>>>   +retry:
>>>> +    skip = false;
>>>>       /* Kernel run queue has higher priority than normal run queue*/
>>>>       for (i = AMD_SCHED_PRIORITY_MAX - 1; i >=
>>>> AMD_SCHED_PRIORITY_MIN; i--) {
>>>> +        if ((i > AMD_SCHED_PRIORITY_MIN) &&
>>>> +            (sched->sched_rq[i - 1].wait >=
>>>> sched->sched_rq[i].wait_base)) {
>>>> +            sched->sched_rq[i - 1].wait = sched->sched_rq[i -
>>>> 1].wait_base;
>>>> +            skip = true;
>>>> +            continue;
>>>> +        }
>>>>           entity = amd_sched_rq_select_entity(&sched->sched_rq[i]);
>>>> -        if (entity)
>>>> +        if (entity) {
>>>> +            if (i > AMD_SCHED_PRIORITY_MIN)
>>>> +                sched->sched_rq[i - 1].wait++;
>>>>               break;
>>>> +        }
>>>>       }
>>>>   +    if (!entity && skip)
>>>> +        goto retry;
>>>> +
>>>>       return entity;
>>>>   }
>>>>   @@ -608,7 +628,7 @@ int amd_sched_init(struct amd_gpu_scheduler
>>>> *sched,
>>>>       sched->name = name;
>>>>       sched->timeout = timeout;
>>>>       for (i = AMD_SCHED_PRIORITY_MIN; i < AMD_SCHED_PRIORITY_MAX; i++)
>>>> -        amd_sched_rq_init(&sched->sched_rq[i]);
>>>> +        amd_sched_rq_init(sched, i);
>>>>         init_waitqueue_head(&sched->wake_up_worker);
>>>>       init_waitqueue_head(&sched->job_scheduled);
>>>> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>>>> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>>>> index 99f0240..4caed30 100644
>>>> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>>>> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
>>>> @@ -64,6 +64,8 @@ struct amd_sched_rq {
>>>>       spinlock_t        lock;
>>>>       struct list_head    entities;
>>>>       struct amd_sched_entity    *current_entity;
>>>> +    int wait_base;
>>>> +    int wait;
>>>>   };
>>>>     struct amd_sched_fence {
>>>
>>>
>>