Quoting Tvrtko Ursulin (2018-12-06 13:12:35) > > On 06/12/2018 08:44, Chris Wilson wrote: > > Braswell is really picky about having our writes posted to memory before > > we execute or else the GPU may see stale values. A wmb() is insufficient > > as it only ensures the writes are visible to other cores, we need a full > > mb() to ensure the writes are in memory and visible to the GPU. > > > > The most frequent failure in flushing before execution is that we see > > stale PTE values and execute the wrong pages. > > > > References: 987abd5c62f9 ("drm/i915/execlists: Force write serialisation into context image vs execution") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index de1e9dc6aec0..e6a86fa4502d 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -379,8 +379,13 @@ static u64 execlists_update_context(struct i915_request *rq) > > * may not be visible to the HW prior to the completion of the UC > > * register write and that we may begin execution from the context > > * before its image is complete leading to invalid PD chasing. > > + * > > + * Furthermore, Braswell, at least, wants a full mb to be sure that > > + * the writes are coherent in memory (visible to the GPU) prior to > > + * execution, and not just visible to other CPUs (as is the result of > > + * wmb). > > */ > > - wmb(); > > + mb(); > > return ce->lrc_desc; > > } > > > > > > Too low level for me to really know what happens under the hood, but at > least I know it can't break anything. The alternative I'm considering is using a mmio read instead. However, the improvement in stability from switching to mb() here is already enough to proceed without necessarily finding the ideal solution. -Chris