Quoting Daniele Ceraolo Spurio (2019-08-23 16:56:54) > > > On 8/23/19 8:52 AM, Chris Wilson wrote: > > Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14) > >> > >> > >> On 8/23/19 8:28 AM, Chris Wilson wrote: > >>> Quoting Chris Wilson (2019-08-23 16:10:48) > >>>> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45) > >>>>> > >>>>> > >>>>> On 8/23/19 7:26 AM, Chris Wilson wrote: > >>>>>> Quoting Chris Wilson (2019-08-23 08:27:25) > >>>>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09) > >>>>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across > >>>>>>>> batch boundaries. This improves performances when lots of small batches > >>>>>>>> are used, but has an impact on self-modifying code. If we want to modify > >>>>>>>> the content of a batch from another ring/batch, we need to either > >>>>>>>> guarantee that the memory location is updated before the pre-parser gets > >>>>>>>> to it or we need to turn the pre-parser off around the modification. > >>>>>>>> In i915, we use self-modifying code only for GPU relocations. > >>>>>>>> > >>>>>>>> The pre-parser fetches across memory synchronization commands as well, > >>>>>>>> so the only way to guarantee that the writes land before the parser gets > >>>>>>>> to it is to have more instructions between the sync and the destination > >>>>>>>> than the parser FIFO depth, which is not an optimal solution. > >>>>>>> > >>>>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each* > >>>>>>> batch. That is a fundamental requirement for our signaling to userspace. > >>>>>>> Please tell me that there is a context flag to turn this off, or we else > >>>>>>> we need to emit 32x flushes or whatever it takes. > >>>>>> > >>>>> Are you referring to the specific case where we have a request modifying > >>>>> an object that is then used as a batch in the next request? Because > >>>>> coherency of objects that are not executed as batches is not impacted. > >>>> > >>>> "Fetches across memory sync" sounds like a major ABI break. The batches > >>>> are a hard serialisation barrier, with memory coherency guaranteed prior > >>>> to the signaling at the end of one batch and clear caches guaranteed at > >>>> the start of the next. > >>> > >>> We have relocs, oa and sseu all using self-modifying code. I expect we > >>> will have PTE modifications and much more done via the GPU in the near > >>> future. All rely on the CS_STALL doing exactly what it says on the tin. > >>> -Chris > >>> > >> > >> I guess the easiest solution is then to keep the parser off outside of > >> user batches. We can default to off and then restore what the user has > >> programmed before the BBSTART. It's not a breach of contract if we say > >> that if you opt-in to the parser then you need to make sure your batches > >> are not self-modifying, right? > > > > Is it just the MI_ARB_ONOFF bits, and is that still a privileged > > command? i.e. can userspace change mode by itself, or it is a > > context-param? > > It's the ARB_CHECK, not the ARB_ONOFF, so yes, it is not privileged and > userspace can modify it itself. It would've been easier if it was a > context param :) Does it go across a context switch? That might be an easy solution for our internal requests (already true for oa/sseu where we use one context to modify another). I do worry though if there might be leakage across our flush-invalidate barriers between userspace batches. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx