Re: [PATCH 3/5] drm/i915/execlists: Direct submit onto idle engines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Tvrtko Ursulin (2018-05-10 18:26:31)
> 
> On 10/05/2018 17:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-05-10 17:09:14)
> >>
> >> On 09/05/2018 15:27, Chris Wilson wrote:
> >>> Bypass using the tasklet to submit the first request to HW, as the
> >>> tasklet may be deferred unto ksoftirqd and at a minimum will add in
> >>> excess of 10us (and maybe tens of milliseconds) to our execution
> >>> latency. This latency reduction is most notable when execution flows
> >>> between engines.
> >>>
> >>> v2: Beware handling preemption completion from the direct submit path as
> >>> well.
> >>> v3: Make the abuse clear and track our extra state inside i915_tasklet.
> >>>
> >>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_tasklet.h         | 24 +++++++
> >>>    drivers/gpu/drm/i915/intel_guc_submission.c | 10 ++-
> >>>    drivers/gpu/drm/i915/intel_lrc.c            | 71 +++++++++++++++++----
> >>>    3 files changed, 89 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_tasklet.h b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> index 42b002b88edb..99e2fa2241ba 100644
> >>> --- a/drivers/gpu/drm/i915/i915_tasklet.h
> >>> +++ b/drivers/gpu/drm/i915/i915_tasklet.h
> >>> @@ -8,8 +8,11 @@
> >>>    #define _I915_TASKLET_H_
> >>>    
> >>>    #include <linux/atomic.h>
> >>> +#include <linux/bitops.h>
> >>>    #include <linux/interrupt.h>
> >>>    
> >>> +#include "i915_gem.h"
> >>> +
> >>>    /**
> >>>     * struct i915_tasklet - wrapper around tasklet_struct
> >>>     *
> >>> @@ -19,6 +22,8 @@
> >>>     */
> >>>    struct i915_tasklet {
> >>>        struct tasklet_struct base;
> >>> +     unsigned long flags;
> >>> +#define I915_TASKLET_DIRECT_SUBMIT BIT(0)
> >>
> >> I would suggest a more generic name for the bit since i915_tasklet is
> >> generic-ish. For instance simply I915_TASKLET_DIRECT would signify the
> >> callback has been invoked directly and not (necessarily) from softirq
> >> context. Then it is for each user to know what that means for them
> >> specifically.
> > 
> > Problem is we have two direct invocations, only one is special. It
> > really wants to be something like I915_TASKLET_ENGINE_IS_LOCKED - you can
> > see why I didn't propose that.
> 
> TBC...
> 
> >>> -static void __submit_queue(struct intel_engine_cs *engine, int prio)
> >>> +static void __wakeup_queue(struct intel_engine_cs *engine, int prio)
> >>>    {
> >>>        engine->execlists.queue_priority = prio;
> >>> +}
> >>
> >> Why is this called wakeup? Plans to add something in it later?
> > 
> > Yes. It's called wakeup because it's setting the value that the dequeue
> > wakes up at. First name was kick_queue, but it doesn't kick either.
> > 
> > The later side-effect involves controlling timers.
> > 
> > __restart_queue()?
> 
> __update_queue_priority? :)

It doesn't just update the priority...

Now a choice between restart_queue and update_queue.

> >>> +static void __schedule_queue(struct intel_engine_cs *engine)
> >>> +{
> >>>        i915_tasklet_schedule(&engine->execlists.tasklet);
> >>>    }
> >>>    
> >>> +static bool __direct_submit(struct intel_engine_execlists *const execlists)
> >>> +{
> >>> +     struct i915_tasklet * const t = &execlists->tasklet;
> >>> +
> >>> +     if (!tasklet_trylock(&t->base))
> >>> +             return false;
> >>> +
> >>> +     t->flags |= I915_TASKLET_DIRECT_SUBMIT;
> >>> +     i915_tasklet_run(t);
> >>> +     t->flags &= ~I915_TASKLET_DIRECT_SUBMIT;
> >>> +
> >>> +     tasklet_unlock(&t->base);
> >>
> >> Feels like this whole sequence belongs to i915_tasklet since it touches
> >> the internals. Maybe i915_tasklet_try_run, or i915_tasklet_run_or_schedule?
> > 
> > Keep reading the series and you'll see just why this is so special and
> > confined to execlists.
> 
> ... TBC here.
> 
> Having peeked ahead, it feels a bit not generic enough as it is, a bit 
> too hacky.
> 
> Would it work to pass context together with the invocation. Like:
> 
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IDLE);
> i915_tasklet_try(..., I915_TASKLET_SUBMIT_IRQ);
> 
> i915_tasklet.flags field namespace would then be owned by the caller 
> completely. And the tasklet func itself would have more context on what 
> to do.

That doesn't apply very well to the use case either. It's not the
tasklet being called from irq/process that's significant but whether we
are calling it with the engine/data locked.

I keep wanting to use LOCKED, but that has no meaning to the tasklet,
and tasklet_trylock means something entirely different.
 
> Following form that, i915_tasklet_run_or_schedule(.., flags).
> 
> bool i915_taskle_try(tasklet, flags)
> {
>         if (!trylock)
>                 return false;
> 
>         t->flags |= flags;
>         i915_tasklet_run(...);
>         t->flags &= ~flags;
> 
>         tasklet_unlock(...);
> 
>         return true;
> }
> 
> 
> void i915_tasklet_run_or_schedule(..., flags)
> {
>         if (!i915_tasklet_try(..., flags))
>                 i915_tasklet_schedule(...);
> }
> 
> ?
> 
> Leaves a question of a tasklet_is_enabled check in your tasklet_try, 
> which I don't quite get since that check wasn't there before. So why it 
> is needed?

Concurrent reset can happen at irq time, but we know cannot happen from
the submit path.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux