Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Tue, 22 Mar 2022 11:41:53 +0100

On Tue, 2022-03-22 at 11:26 +0100, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 15:15, Thomas Hellström wrote:
> > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > > Hi,
> > > > > 
> > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > > > > 
> > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > > Hi, Tvrtko.
> > > > > > > > > 
> > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > > 
> > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > > To align with the discussion in [1][2], this
> > > > > > > > > > > patch
> > > > > > > > > > > series
> > > > > > > > > > > drops
> > > > > > > > > > > all
> > > > > > > > > > > usage of
> > > > > > > > > > > wbvind_on_all_cpus within i915 by either
> > > > > > > > > > > replacing
> > > > > > > > > > > the
> > > > > > > > > > > call
> > > > > > > > > > > with certain
> > > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > > logic.
> > > > > > > > > > 
> > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > > provide
> > > > > > > > > > non
> > > > > > > > > > x86
> > > > > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > > > > Instead an
> > > > > > > > > > arch
> > > > > > > > > > agnostic helper which achieves the same effect
> > > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > created.
> > > > > > > > > > Does
> > > > > > > > > > Arm have such concept?
> > > > > > > > > 
> > > > > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > > > > incoherent
> > > > > > > > > IO
> > > > > > > > > to
> > > > > > > > > other architectures, meaning any remaining wbinvd()s
> > > > > > > > > should
> > > > > > > > > be
> > > > > > > > > X86
> > > > > > > > > only.
> > > > > > > > 
> > > > > > > > The last part is completely obvious since it is a x86
> > > > > > > > instruction
> > > > > > > > name.
> > > > > > > 
> > > > > > > Yeah, I meant the function implementing wbinvd()
> > > > > > > semantics.
> > > > > > > 
> > > > > > > > 
> > > > > > > > But I think we can't pick a solution until we know how
> > > > > > > > the
> > > > > > > > concept
> > > > > > > > maps
> > > > > > > > to Arm and that will also include seeing how the
> > > > > > > > drm_clflush_sg for
> > > > > > > > Arm
> > > > > > > > would look. Is there a range based solution, or just a
> > > > > > > > big
> > > > > > > > hammer
> > > > > > > > there.
> > > > > > > > If the latter, then it is no good to churn all these
> > > > > > > > reverts
> > > > > > > > but
> > > > > > > > instead
> > > > > > > > an arch agnostic wrapper, with a generic name, would be
> > > > > > > > the
> > > > > > > > way to
> > > > > > > > go.
> > > > > > > 
> > > > > > > But my impression was that ARM would not need the range-
> > > > > > > based
> > > > > > > interface
> > > > > > > either, because ARM is only for discrete and with
> > > > > > > discrete
> > > > > > > we're
> > > > > > > always
> > > > > > > coherent.
> > > > > > 
> > > > > > Not sure what you mean here - what about flushing system
> > > > > > memory
> > > > > > objects
> > > > > > on discrete? Those still need flushing on paths like
> > > > > > suspend
> > > > > > which this
> > > > > > series touches. Am I missing something?
> > > > > 
> > > > > System bos on discrete should always have
> > > > > 
> > > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > > 
> > > > > either by the gpu being fully cache coherent (or us mapping
> > > > > system
> > > > > write-combined). Hence no need for cache clflushes or
> > > > > wbinvd()
> > > > > for
> > > > > incoherent IO.
> > > > 
> > > > Hmm so you are talking about the shmem ttm backend. It ends up
> > > > depending on the result of i915_ttm_cache_level, yes? It cannot
> > > > end
> > > > up with I915_CACHE_NONE from that function?
> > > 
> > > If the object is allocated with allowable placement in either
> > > LMEM
> > > or
> > > SYSTEM, and it ends in system, it gets allocated with
> > > I915_CACHE_NONE,
> > > but then the shmem ttm backend isn't used but TTM's wc pools, and
> > > the
> > > object should *always* be mapped wc. Even in system.
> > 
> > I am not familiar with neither TTM backend or wc pools so maybe a
> > missed 
> > question - if obj->cache_level can be set to none, and 
> > obj->cache_coherency to zero, then during object lifetime helpers
> > which 
> > consult those fields (like i915_gem_cpu_write_needs_clflush, 
> > __start_cpu_write, etc) are giving out incorrect answers? That is,
> > it
> > is 
> > irrelevant that they would say flushes are required, since in
> > actuality 
> > those objects can never ever and from anywhere be mapped other than
> > WC 
> > so flushes aren't actually required?
> 
> If we map other than WC somewhere in these situations, that should be
> a
> bug needing a fix. It might be that some of these helpers that you
> mention might still flag that a clflush is needed, and in that case
> that's an oversight that also needs fixing.

Actually, it seems like most of these has a IS_DGFX() in them, in
particular i915_gem_clflush_object(), but it looks like some sort of
cleanup might be needed here. In particular we might want to introduce
an IS_COHERENT() in case we change the api at some point also for
integrated.

/Thomas