Re: [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 23 Aug 2019 17:31:37 +0100



Quoting Daniele Ceraolo Spurio (2019-08-23 16:56:54)
> 
> 
> On 8/23/19 8:52 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14)
> >>
> >>
> >> On 8/23/19 8:28 AM, Chris Wilson wrote:
> >>> Quoting Chris Wilson (2019-08-23 16:10:48)
> >>>> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45)
> >>>>>
> >>>>>
> >>>>> On 8/23/19 7:26 AM, Chris Wilson wrote:
> >>>>>> Quoting Chris Wilson (2019-08-23 08:27:25)
> >>>>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09)
> >>>>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across
> >>>>>>>> batch boundaries. This improves performances when lots of small batches
> >>>>>>>> are used, but has an impact on self-modifying code. If we want to modify
> >>>>>>>> the content of a batch from another ring/batch, we need to either
> >>>>>>>> guarantee that the memory location is updated before the pre-parser gets
> >>>>>>>> to it or we need to turn the pre-parser off around the modification.
> >>>>>>>> In i915, we use self-modifying code only for GPU relocations.
> >>>>>>>>
> >>>>>>>> The pre-parser fetches across memory synchronization commands as well,
> >>>>>>>> so the only way to guarantee that the writes land before the parser gets
> >>>>>>>> to it is to have more instructions between the sync and the destination
> >>>>>>>> than the parser FIFO depth, which is not an optimal solution.
> >>>>>>>
> >>>>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each*
> >>>>>>> batch. That is a fundamental requirement for our signaling to userspace.
> >>>>>>> Please tell me that there is a context flag to turn this off, or we else
> >>>>>>> we need to emit 32x flushes or whatever it takes.
> >>>>>>
> >>>>> Are you referring to the specific case where we have a request modifying
> >>>>> an object that is then used as a batch in the next request? Because
> >>>>> coherency of objects that are not executed as batches is not impacted.
> >>>>
> >>>> "Fetches across memory sync" sounds like a major ABI break. The batches
> >>>> are a hard serialisation barrier, with memory coherency guaranteed prior
> >>>> to the signaling at the end of one batch and clear caches guaranteed at
> >>>> the start of the next.
> >>>
> >>> We have relocs, oa and sseu all using self-modifying code. I expect we
> >>> will have PTE modifications and much more done via the GPU in the near
> >>> future. All rely on the CS_STALL doing exactly what it says on the tin.
> >>> -Chris
> >>>
> >>
> >> I guess the easiest solution is then to keep the parser off outside of
> >> user batches. We can default to off and then restore what the user has
> >> programmed before the BBSTART. It's not a breach of contract if we say
> >> that if you opt-in to the parser then you need to make sure your batches
> >> are not self-modifying, right?
> > 
> > Is it just the MI_ARB_ONOFF bits, and is that still a privileged
> > command? i.e. can userspace change mode by itself, or it is a
> > context-param?
> 
> It's the ARB_CHECK, not the ARB_ONOFF, so yes, it is not privileged and 
> userspace can modify it itself. It would've been easier if it was a 
> context param :)

Does it go across a context switch? That might be an easy solution for
our internal requests (already true for oa/sseu where we use one context
to modify another). I do worry though if there might be leakage
across our flush-invalidate barriers between userspace batches.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx