Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14) > > > On 8/23/19 8:28 AM, Chris Wilson wrote: > > Quoting Chris Wilson (2019-08-23 16:10:48) > >> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45) > >>> > >>> > >>> On 8/23/19 7:26 AM, Chris Wilson wrote: > >>>> Quoting Chris Wilson (2019-08-23 08:27:25) > >>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09) > >>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across > >>>>>> batch boundaries. This improves performances when lots of small batches > >>>>>> are used, but has an impact on self-modifying code. If we want to modify > >>>>>> the content of a batch from another ring/batch, we need to either > >>>>>> guarantee that the memory location is updated before the pre-parser gets > >>>>>> to it or we need to turn the pre-parser off around the modification. > >>>>>> In i915, we use self-modifying code only for GPU relocations. > >>>>>> > >>>>>> The pre-parser fetches across memory synchronization commands as well, > >>>>>> so the only way to guarantee that the writes land before the parser gets > >>>>>> to it is to have more instructions between the sync and the destination > >>>>>> than the parser FIFO depth, which is not an optimal solution. > >>>>> > >>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each* > >>>>> batch. That is a fundamental requirement for our signaling to userspace. > >>>>> Please tell me that there is a context flag to turn this off, or we else > >>>>> we need to emit 32x flushes or whatever it takes. > >>>> > >>> Are you referring to the specific case where we have a request modifying > >>> an object that is then used as a batch in the next request? Because > >>> coherency of objects that are not executed as batches is not impacted. > >> > >> "Fetches across memory sync" sounds like a major ABI break. The batches > >> are a hard serialisation barrier, with memory coherency guaranteed prior > >> to the signaling at the end of one batch and clear caches guaranteed at > >> the start of the next. > > > > We have relocs, oa and sseu all using self-modifying code. I expect we > > will have PTE modifications and much more done via the GPU in the near > > future. All rely on the CS_STALL doing exactly what it says on the tin. > > -Chris > > > > I guess the easiest solution is then to keep the parser off outside of > user batches. We can default to off and then restore what the user has > programmed before the BBSTART. It's not a breach of contract if we say > that if you opt-in to the parser then you need to make sure your batches > are not self-modifying, right? Is it just the MI_ARB_ONOFF bits, and is that still a privileged command? i.e. can userspace change mode by itself, or it is a context-param? > BTW the CS_STALL does not guarantee on pre-gen12 gens that > self-modifying code works within the same batch/ring because the > pre-parser is already pre-fetching across memory sync points, it just > stops at the next arb point. Ok, we still uphold our contract if they can't execute any code in the window where they would see someone else's data. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx