Re: [Intel-gfx] [PATCH 07/27] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission

Matthew Brost <matthew.brost@xxxxxxxxx> · Mon, 13 Sep 2021 22:02:08 -0700

On Mon, Sep 13, 2021 at 03:38:44PM -0700, John Harrison wrote:
> On 9/13/2021 09:54, Matthew Brost wrote:
> 
>     On Thu, Sep 09, 2021 at 03:51:27PM -0700, John Harrison wrote:
> 
>         On 8/20/2021 15:44, Matthew Brost wrote:
> 
>             Calling switch_to_kernel_context isn't needed if the engine PM reference
>             is taken while all contexts are pinned. By not calling
>             switch_to_kernel_context we save on issuing a request to the engine.
> 
>         I thought the intention of the switch_to_kernel was to ensure that the GPU
>         is not touching any user context and is basically idle. That is not a valid
>         assumption with an external scheduler such as GuC. So why is the description
>         above only mentioning PM references? What is the connection between the PM
>         ref and the switch_to_kernel?
> 
>         Also, the comment in the code does not mention anything about PM references,
>         it just says 'not necessary with GuC' but no explanation at all.
> 
> 
>     Yea, this need to be explained better. How about this?
> 
>     Calling switch_to_kernel_context isn't needed if the engine PM reference
>     is take while all user contexts have scheduling enabled. Once scheduling
>     is disabled on all user contexts the GuC is guaranteed to not touch any
>     user context state which is effectively the same pointing to a kernel
>     context.
> 
>     Matt
> 
> I'm still not seeing how the PM reference is involved?
> 

We shouldn't trap into the GT PM park code while a user context has
scheduling enabled as the GT PM park code may have side affects we don't
to execute if a user context still has scheduling enabled. I guess that
isn't explained very well.

> Also, IMHO the focus is wrong in the above text. The fundamental requirement is
> the ensure the hardware is idle. Execlist achieves this by switching to a safe
> context. GuC achieves it by disabling scheduling. Indeed, switching to a 'safe'
> context really has no effect with GuC submission. So 'effectively the same as
> pointing to a kernel context' is an incorrect description. I would go with
> something like:
> 
>     "This is execlist specific behaviour intended to ensure the GPU is idle by
>     switching to a known 'safe' context. With GuC submission, the same idle
>     guarantee is achieved by other means (disabling scheduling). Further,
>     switching to a 'safe' context has no effect with GuC submission as the
>     scheduler can just switch back again.
>     FIXME: Move this backend scheduler specific behaviour into the scheduler
>     backend."
>

That is worded better. Will pull into the next rev.

Matt

> 
> John.
> 
> 
> 
> 
> 
>             v2:
>               (Daniel Vetter)
>                - Add FIXME comment about pushing switch_to_kernel_context to backend
> 
>             Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
>             Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
>             ---
>               drivers/gpu/drm/i915/gt/intel_engine_pm.c | 9 +++++++++
>               1 file changed, 9 insertions(+)
> 
>             diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>             index 1f07ac4e0672..11fee66daf60 100644
>             --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>             +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>             @@ -162,6 +162,15 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
>                     unsigned long flags;
>                     bool result = true;
>             +       /*
>             +        * No need to switch_to_kernel_context if GuC submission
>             +        *
>             +        * FIXME: This execlists specific backend behavior in generic code, this
> 
>         "This execlists" -> "This is execlist"
> 
>         "this should be" -> "it should be"
> 
>         John.
> 
> 
>             +        * should be pushed to the backend.
>             +        */
>             +       if (intel_engine_uses_guc(engine))
>             +               return true;
>             +
>                     /* GPU is pointing to the void, as good as in the kernel context. */
>                     if (intel_gt_is_wedged(engine->gt))
>                             return true;
> 
> 
> SECURITY NOTE: file ~/.netrc must not be accessible by others