Re: [PATCH v1] drm/i915/gen11: Preempt-to-idle support in execlists.

"Lis, Tomasz" <tomasz.lis@xxxxxxxxx> · Fri, 30 Mar 2018 17:42:02 +0200

On 2018-03-29 00:28, Chris Wilson wrote:
Quoting Lis, Tomasz (2018-03-28 17:06:58)

On 2018-03-28 01:27, Chris Wilson wrote:
Quoting Tomasz Lis (2018-03-27 16:17:59)
The patch adds support of preempt-to-idle requesting by setting a proper
bit within Execlist Control Register, and receiving preemption result from
Context Status Buffer.

Preemption in previous gens required a special batch buffer to be executed,
so the Command Streamer never preempted to idle directly. In Icelake it is
possible, as there is a hardware mechanism to inform the kernel about
status of the preemption request.

This patch does not cover using the new preemption mechanism when GuC is
active.

Bspec: 18922
Signed-off-by: Tomasz Lis <tomasz.lis@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_drv.h          |  2 ++
   drivers/gpu/drm/i915/i915_pci.c          |  3 ++-
   drivers/gpu/drm/i915/intel_device_info.h |  1 +
   drivers/gpu/drm/i915/intel_lrc.c         | 45 +++++++++++++++++++++++++++-----
   drivers/gpu/drm/i915/intel_lrc.h         |  1 +
   5 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 800230b..c32580b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2514,6 +2514,8 @@ intel_info(const struct drm_i915_private *dev_priv)
                  ((dev_priv)->info.has_logical_ring_elsq)
   #define HAS_LOGICAL_RING_PREEMPTION(dev_priv) \
                  ((dev_priv)->info.has_logical_ring_preemption)
+#define HAS_HW_PREEMPT_TO_IDLE(dev_priv) \
+               ((dev_priv)->info.has_hw_preempt_to_idle)
   
   #define HAS_EXECLISTS(dev_priv) HAS_LOGICAL_RING_CONTEXTS(dev_priv)
   
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4364922..66b6700 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -595,7 +595,8 @@ static const struct intel_device_info intel_cannonlake_info = {
          GEN(11), \
          .ddb_size = 2048, \
          .has_csr = 0, \
-       .has_logical_ring_elsq = 1
+       .has_logical_ring_elsq = 1, \
+       .has_hw_preempt_to_idle = 1
   
   static const struct intel_device_info intel_icelake_11_info = {
          GEN11_FEATURES,
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 933e316..4eb97b5 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -98,6 +98,7 @@ enum intel_platform {
          func(has_logical_ring_contexts); \
          func(has_logical_ring_elsq); \
          func(has_logical_ring_preemption); \
+       func(has_hw_preempt_to_idle); \
          func(has_overlay); \
          func(has_pooled_eu); \
          func(has_psr); \
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ba7f783..1a22de4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -153,6 +153,7 @@
   #define GEN8_CTX_STATUS_ACTIVE_IDLE    (1 << 3)
   #define GEN8_CTX_STATUS_COMPLETE       (1 << 4)
   #define GEN8_CTX_STATUS_LITE_RESTORE   (1 << 15)
+#define GEN11_CTX_STATUS_PREEMPT_IDLE  (1 << 29)
   
   #define GEN8_CTX_STATUS_COMPLETED_MASK \
           (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED)
@@ -183,7 +184,9 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
                                  const struct i915_request *last,
                                  int prio)
   {
-       return engine->i915->preempt_context && prio > max(rq_prio(last), 0);
+       return (engine->i915->preempt_context ||
+               HAS_HW_PREEMPT_TO_IDLE(engine->i915)) &&
Well, you haven't actually disabled allocating the preempt_context so...
Yes.. I had mixed feelings about changing needs_preempt_context() now,
as that would mean adding a temporary condition on GuC until the GuC
preemption is merged.
I will add the conditions and disable the allocation in v2 of the patch.
But at any rate, making this an engine->flag would eliminate one pointer
dance.
Could be an interesting idea for a separate patch.
To land first ;)
:)
Sure, I can do that.
+                prio > max(rq_prio(last), 0);
   }
   
   /**
@@ -535,6 +538,25 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
          execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
   }
   
+static void gen11_preempt_to_idle(struct intel_engine_cs *engine)
+{
+       struct intel_engine_execlists *execlists = &engine->execlists;
+
+       GEM_TRACE("%s\n", engine->name);
+
+       /*
+        * hardware which HAS_HW_PREEMPT_TO_IDLE(), always also
+        * HAS_LOGICAL_RING_ELSQ(), so we can assume ctrl_reg is set
+        */
+       GEM_BUG_ON(execlists->ctrl_reg != NULL);
+
+       /* trigger preemption to idle */
+       writel(EL_CTRL_PREEMPT_TO_IDLE, execlists->ctrl_reg);
Future plans? Because just inserting the branch into the setter of
inject_preempt_context() resolves a lot of conflicts with other work.
My arguments for separate function are:
- better code readability
- keeping the symmetry between execlist and GuC flow - GuC preemption
patches will introduce separate function as well
- only 4 lines of the function would be common
- the name inject_preempt_context() wouldn't match the new purpose, so
renaming would be needed
- reduced self-documenting code due to two separate preempt methods not
having distinct names

That's all, I don't have any future plans for it. If you want me to
merge the two, let me know.
The problem that I am worrying about is that we will duplicate bunch of
other code, the actual ELS[PQ] write is the smaller portion. Plus we
already have the branch on something much more pleasant.
I see. I don't know any details there, so I'm not able to weigh that.
Just let me know whether this possible duplication outweights the 
arguments I provided, and I will merge these functions.
I'm not overly attached to my solution.

@@ -962,10 +987,13 @@ static void execlists_submission_tasklet(unsigned long data)
                                    status, buf[2*head + 1],
                                    execlists->active);
   
-                       if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
-                                     GEN8_CTX_STATUS_PREEMPTED))
+                       /* Check if switched to active or preempted to active */
+                       if ((status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
+                                       GEN8_CTX_STATUS_PREEMPTED)) &&
+                           !(status & GEN11_CTX_STATUS_PREEMPT_IDLE))
Setting HWACK here is harmless as it gets cleared again. Unless, there
is some oddity in the code flow.
I will check if lack of the change affects test results.
Personally, I would keep this change, even if only for allowing simple
definition of what HWACK flag means.
The simple definition is the opposite one, imo. We set the flag after we
get the corresponding response from HW; any preemption or activate event
must follow the most recent ELSP write. So that will include the
preemption event following the preempt-idle write.

Then on deciding that the HW is idle, we apply the complication such
that execlists->active == 0. (That rule is what breaks the pattern.)
-Chris
Ok, I will remove this unnecessary condition.
I tested this and lack of it doesn't seem to affect the results.
(I'll be out next week; expect v2 when I'm back)
-Tomasz

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx