On 7/12/2017 4:41 PM, Daniele Ceraolo Spurio wrote:
On 12/07/17 15:58, Chris Wilson wrote:
The engine provides also provides mirror of the CSB write pointer in the
HWSP, but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Michel Thierry <michel.thierry@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
---
drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++++++-----
drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e413465a552b..db750abb905e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -562,9 +562,15 @@ static void intel_lrc_irq_handler(unsigned long data)
* is set and we do a new loop.
*/
__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
- head = readl(csb_mmio);
- tail = GEN8_CSB_WRITE_PTR(head);
- head = GEN8_CSB_READ_PTR(head);
+ if (unlikely(engine->csb_head == -1)) { /* following a reset */
+ head = readl(csb_mmio);
+ tail = GEN8_CSB_WRITE_PTR(head);
+ head = GEN8_CSB_READ_PTR(head);
+ engine->csb_head = head;
+ } else {
+ head = engine->csb_head;
+ tail = buf[0xf];
In CNL the tail moves to offset 0x2f of the HWSP (i.e. buf[0x1f]), might
be worth considering it immediately since CNL support is being merged.
-Daniele
Also right now it is implicitly doing csb_head_dword - csb_start_dword
(0x1f - 0x10). I would vote to have them as defines.
And my only other comment, wouldn't the same changes should apply to the
i915_engine_info debugfs? For example: https://hastebin.com/tahurezeri
-Michel
+ }
while (head != tail) {
struct drm_i915_gem_request *rq;
unsigned int status;
@@ -618,8 +624,11 @@ static void intel_lrc_irq_handler(unsigned long data)
!(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
}
- writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
- csb_mmio);
+ if (head != engine->csb_head) {
+ engine->csb_head = head;
+ writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+ csb_mmio);
+ }
}
if (execlists_elsp_ready(engine))
@@ -1246,6 +1255,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
/* After a GPU reset, we may have requests to replay */
clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+ engine->csb_head = -1;
submit = false;
for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d33c93444c0d..56751413e40c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,6 +391,7 @@ struct intel_engine_cs {
struct rb_root execlist_queue;
struct rb_node *execlist_first;
unsigned int fw_domains;
+ unsigned int csb_head;
/* Contexts are pinned whilst they are active on the GPU. The last
* context executed remains active whilst the GPU is idle - the
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx