Re: [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs

Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> · Fri, 23 Aug 2019 10:01:03 -0700

On 8/23/19 9:31 AM, Chris Wilson wrote:
Quoting Daniele Ceraolo Spurio (2019-08-23 16:56:54)

On 8/23/19 8:52 AM, Chris Wilson wrote:
Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14)

On 8/23/19 8:28 AM, Chris Wilson wrote:
Quoting Chris Wilson (2019-08-23 16:10:48)
Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45)

On 8/23/19 7:26 AM, Chris Wilson wrote:
Quoting Chris Wilson (2019-08-23 08:27:25)
Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09)
TGL has an improved CS pre-parser that can now pre-fetch commands across
batch boundaries. This improves performances when lots of small batches
are used, but has an impact on self-modifying code. If we want to modify
the content of a batch from another ring/batch, we need to either
guarantee that the memory location is updated before the pre-parser gets
to it or we need to turn the pre-parser off around the modification.
In i915, we use self-modifying code only for GPU relocations.

The pre-parser fetches across memory synchronization commands as well,
so the only way to guarantee that the writes land before the parser gets
to it is to have more instructions between the sync and the destination
than the parser FIFO depth, which is not an optimal solution.

Well, our ABI is that memory is coherent before the breadcrumb of *each*
batch. That is a fundamental requirement for our signaling to userspace.
Please tell me that there is a context flag to turn this off, or we else
we need to emit 32x flushes or whatever it takes.

Are you referring to the specific case where we have a request modifying
an object that is then used as a batch in the next request? Because
coherency of objects that are not executed as batches is not impacted.

"Fetches across memory sync" sounds like a major ABI break. The batches
are a hard serialisation barrier, with memory coherency guaranteed prior
to the signaling at the end of one batch and clear caches guaranteed at
the start of the next.

We have relocs, oa and sseu all using self-modifying code. I expect we
will have PTE modifications and much more done via the GPU in the near
future. All rely on the CS_STALL doing exactly what it says on the tin.
-Chris

I guess the easiest solution is then to keep the parser off outside of
user batches. We can default to off and then restore what the user has
programmed before the BBSTART. It's not a breach of contract if we say
that if you opt-in to the parser then you need to make sure your batches
are not self-modifying, right?

Is it just the MI_ARB_ONOFF bits, and is that still a privileged
command? i.e. can userspace change mode by itself, or it is a
context-param?

It's the ARB_CHECK, not the ARB_ONOFF, so yes, it is not privileged and
userspace can modify it itself. It would've been easier if it was a
context param :)

Does it go across a context switch? That might be an easy solution for
our internal requests (already true for oa/sseu where we use one context
to modify another). I do worry though if there might be leakage
across our flush-invalidate barriers between userspace batches.

The pre-fetching? no, AFAIK that's confined to the context, so moving 
the relocs to another context would work; the status of the parser is 
also ctx save/restored. What leakage case are you worried about? The 
memory synchronization between contexts is unchanged, a context can only 
mess up its own instructions. AFAIU the only thing that's possible is 
that if a batch modifies the next batch in the same context then the CS 
won't see the update.

Daniele

-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx