Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 21/03/2022 15:15, Thomas Hellström wrote:
On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:

On 21/03/2022 13:40, Thomas Hellström wrote:
Hi,

On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:

On 21/03/2022 12:33, Thomas Hellström wrote:
On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:

On 21/03/2022 11:03, Thomas Hellström wrote:
Hi, Tvrtko.

On 3/21/22 11:27, Tvrtko Ursulin wrote:

On 19/03/2022 19:42, Michael Cheng wrote:
To align with the discussion in [1][2], this patch
series
drops
all
usage of
wbvind_on_all_cpus within i915 by either replacing the
call
with certain
drm clflush helpers, or reverting to a previous logic.

AFAIU, complaint from [1] was that it is wrong to provide
non
x86
implementations under the wbinvd_on_all_cpus name.
Instead an
arch
agnostic helper which achieves the same effect could be
created.
Does
Arm have such concept?

I also understand Linus' email like we shouldn't leak
incoherent
IO
to
other architectures, meaning any remaining wbinvd()s should
be
X86
only.

The last part is completely obvious since it is a x86
instruction
name.

Yeah, I meant the function implementing wbinvd() semantics.


But I think we can't pick a solution until we know how the
concept
maps
to Arm and that will also include seeing how the
drm_clflush_sg for
Arm
would look. Is there a range based solution, or just a big
hammer
there.
If the latter, then it is no good to churn all these reverts
but
instead
an arch agnostic wrapper, with a generic name, would be the
way to
go.

But my impression was that ARM would not need the range-based
interface
either, because ARM is only for discrete and with discrete
we're
always
coherent.

Not sure what you mean here - what about flushing system memory
objects
on discrete? Those still need flushing on paths like suspend
which this
series touches. Am I missing something?

System bos on discrete should always have

I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE

either by the gpu being fully cache coherent (or us mapping system
write-combined). Hence no need for cache clflushes or wbinvd() for
incoherent IO.

Hmm so you are talking about the shmem ttm backend. It ends up
depending on the result of i915_ttm_cache_level, yes? It cannot end
up with I915_CACHE_NONE from that function?

If the object is allocated with allowable placement in either LMEM or
SYSTEM, and it ends in system, it gets allocated with I915_CACHE_NONE,
but then the shmem ttm backend isn't used but TTM's wc pools, and the
object should *always* be mapped wc. Even in system.

I am not familiar with neither TTM backend or wc pools so maybe a missed question - if obj->cache_level can be set to none, and obj->cache_coherency to zero, then during object lifetime helpers which consult those fields (like i915_gem_cpu_write_needs_clflush, __start_cpu_write, etc) are giving out incorrect answers? That is, it is irrelevant that they would say flushes are required, since in actuality those objects can never ever and from anywhere be mapped other than WC so flushes aren't actually required?

I also found in i915_drm.h:

          * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`,
WC or WB will
          * be used, depending on the object placement on creation. WB
will be used
          * when the object can only exist in system memory, WC
otherwise.

If what you say is true, that on discrete it is _always_ WC, then
that needs updating as well.

If an object is allocated as system only, then it is mapped WB, and
we're relying on the gpu being cache coherent to avoid clflushes. Same
is actually currently true if the object happens to be accessed by the
cpu while evicted. Might need an update for that.

Hmm okay, I think I actually misunderstood something here. I think the reason for difference bbtween smem+lmem object which happens to be in smem and smem only object is eluding me.


That's adhering to Linus'

"And I sincerely hope to the gods that no cache-incoherent i915
mess
ever makes it out of the x86 world. Incoherent IO was always a
historical mistake and should never ever happen again, so we should
not spread that horrific pattern around."

Sure, but I was not talking about IO - just the CPU side access to
CPU side objects.

OK, I was under the impression that clflushes() and wbinvd()s in i915
was only ever used to make data visible to non-snooping GPUs.

Do you mean that there are other uses as well? Agreed the wb cache
flush on on suspend only if gpu is !I915_BO_CACHE_COHERENT_FOR_READ?
looks to not fit this pattern completely.

Don't know, I was first trying to understand handling of the obj->cache_coherent as discussed in the first quote block. Are the flags consistently set and how the Arm low level code will look.

Otherwise, for architectures where memory isn't always fully coherent
with the cpu cache, I'd expect them to use the apis in
asm/cacheflush.h, like flush_cache_range() and similar, which are nops
on x86.

Hm do you know why there are no-ops? Like why wouldn't they map to clflush?

Regards,

Tvrtko



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux