On Tue, 2022-03-22 at 11:26 +0100, Thomas Hellström wrote: > On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote: > > > > On 21/03/2022 15:15, Thomas Hellström wrote: > > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote: > > > > > > > > On 21/03/2022 13:40, Thomas Hellström wrote: > > > > > Hi, > > > > > > > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote: > > > > > > > > > > > > On 21/03/2022 12:33, Thomas Hellström wrote: > > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote: > > > > > > > > > > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote: > > > > > > > > > Hi, Tvrtko. > > > > > > > > > > > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote: > > > > > > > > > > > > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote: > > > > > > > > > > > To align with the discussion in [1][2], this > > > > > > > > > > > patch > > > > > > > > > > > series > > > > > > > > > > > drops > > > > > > > > > > > all > > > > > > > > > > > usage of > > > > > > > > > > > wbvind_on_all_cpus within i915 by either > > > > > > > > > > > replacing > > > > > > > > > > > the > > > > > > > > > > > call > > > > > > > > > > > with certain > > > > > > > > > > > drm clflush helpers, or reverting to a previous > > > > > > > > > > > logic. > > > > > > > > > > > > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to > > > > > > > > > > provide > > > > > > > > > > non > > > > > > > > > > x86 > > > > > > > > > > implementations under the wbinvd_on_all_cpus name. > > > > > > > > > > Instead an > > > > > > > > > > arch > > > > > > > > > > agnostic helper which achieves the same effect > > > > > > > > > > could > > > > > > > > > > be > > > > > > > > > > created. > > > > > > > > > > Does > > > > > > > > > > Arm have such concept? > > > > > > > > > > > > > > > > > > I also understand Linus' email like we shouldn't leak > > > > > > > > > incoherent > > > > > > > > > IO > > > > > > > > > to > > > > > > > > > other architectures, meaning any remaining wbinvd()s > > > > > > > > > should > > > > > > > > > be > > > > > > > > > X86 > > > > > > > > > only. > > > > > > > > > > > > > > > > The last part is completely obvious since it is a x86 > > > > > > > > instruction > > > > > > > > name. > > > > > > > > > > > > > > Yeah, I meant the function implementing wbinvd() > > > > > > > semantics. > > > > > > > > > > > > > > > > > > > > > > > But I think we can't pick a solution until we know how > > > > > > > > the > > > > > > > > concept > > > > > > > > maps > > > > > > > > to Arm and that will also include seeing how the > > > > > > > > drm_clflush_sg for > > > > > > > > Arm > > > > > > > > would look. Is there a range based solution, or just a > > > > > > > > big > > > > > > > > hammer > > > > > > > > there. > > > > > > > > If the latter, then it is no good to churn all these > > > > > > > > reverts > > > > > > > > but > > > > > > > > instead > > > > > > > > an arch agnostic wrapper, with a generic name, would be > > > > > > > > the > > > > > > > > way to > > > > > > > > go. > > > > > > > > > > > > > > But my impression was that ARM would not need the range- > > > > > > > based > > > > > > > interface > > > > > > > either, because ARM is only for discrete and with > > > > > > > discrete > > > > > > > we're > > > > > > > always > > > > > > > coherent. > > > > > > > > > > > > Not sure what you mean here - what about flushing system > > > > > > memory > > > > > > objects > > > > > > on discrete? Those still need flushing on paths like > > > > > > suspend > > > > > > which this > > > > > > series touches. Am I missing something? > > > > > > > > > > System bos on discrete should always have > > > > > > > > > > I915_BO_CACHE_COHERENT_FOR_READ | > > > > > I915_BO_CACHE_COHERENT_FOR_WRITE > > > > > > > > > > either by the gpu being fully cache coherent (or us mapping > > > > > system > > > > > write-combined). Hence no need for cache clflushes or > > > > > wbinvd() > > > > > for > > > > > incoherent IO. > > > > > > > > Hmm so you are talking about the shmem ttm backend. It ends up > > > > depending on the result of i915_ttm_cache_level, yes? It cannot > > > > end > > > > up with I915_CACHE_NONE from that function? > > > > > > If the object is allocated with allowable placement in either > > > LMEM > > > or > > > SYSTEM, and it ends in system, it gets allocated with > > > I915_CACHE_NONE, > > > but then the shmem ttm backend isn't used but TTM's wc pools, and > > > the > > > object should *always* be mapped wc. Even in system. > > > > I am not familiar with neither TTM backend or wc pools so maybe a > > missed > > question - if obj->cache_level can be set to none, and > > obj->cache_coherency to zero, then during object lifetime helpers > > which > > consult those fields (like i915_gem_cpu_write_needs_clflush, > > __start_cpu_write, etc) are giving out incorrect answers? That is, > > it > > is > > irrelevant that they would say flushes are required, since in > > actuality > > those objects can never ever and from anywhere be mapped other than > > WC > > so flushes aren't actually required? > > If we map other than WC somewhere in these situations, that should be > a > bug needing a fix. It might be that some of these helpers that you > mention might still flag that a clflush is needed, and in that case > that's an oversight that also needs fixing. Actually, it seems like most of these has a IS_DGFX() in them, in particular i915_gem_clflush_object(), but it looks like some sort of cleanup might be needed here. In particular we might want to introduce an IS_COHERENT() in case we change the api at some point also for integrated. /Thomas