On Thu, 17 Jan 2019 at 07:07, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, 2019-01-16 at 08:47 +0100, Ard Biesheuvel wrote: > > > As far as I know on x86 it doesn't, so when you have an un-cached page > > > you can still access it with a snooping DMA read/write operation and > > > don't cause trouble. > > > > > > > I think it is the other way around. The question is, on an otherwise > > cache coherent device, whether the NoSnoop attribute set by the GPU > > propagates all the way to the bus so that it bypasses the caches. > > On powerpc it's ignored, all DMA accesses will be snooped. But that's > fine regardless of whether the memory was mapped cachable or not, the > snooper will simply not find anything if not. I *think* we only do > cache inject if the line already exists in one of the caches. > Others should correct me if I am wrong, but arm64 SoCs often have L3 system caches, and I would expect inbound transactions with writeback write-allocate (WBWA) attributes to allocate there. > > On x86, we can tolerate if this is not the case, since uncached memory > > accesses by the CPU snoop the caches as well. > > > > On other architectures, uncached accesses go straight to main memory, > > so if the device wrote anything to the caches we won't see it. > > Well, on all powerpc implementations that I am aware of at least (dunno > about ARM), they do, but we don't have a problem because I don't think > the devices can/will write to the caches directly unless a > corresponding line already exists (but I might be wrong, we need to > double check all implementations which is tricky). > > I am not aware of any powerpc chip implementing NoSnoop. > Do you have any history on why this optimization is disabled for power unless CONFIG_NOT_CACHE_COHERENT is set? That also begs the question how any of this is supposed to work with non-cache coherent DMA. The code appears to always assume cache coherent, and provide non-cache coherent as an optimization if dma_arch_can_wc_memory() returns true. So I wonder if that helper should take a struct device pointer instead, and return true for non-cache coherent devices. > > So to use this optimization, you have to either be 100% sure that > > NoSnoop is implemented correctly, or have a x86 CPU. > > > > > > The old hack of using non-cached mapping to avoid snoop cost in AGP and > > > > others is just that ... an ugly and horrible hacks that should have > > > > never eventuated, when the search for performance pushes HW people into > > > > utter insanity :) > > > > > > Well I agree that un-cached system memory makes things much more > > > complicated for a questionable gain. > > > > > > But fact is we now have to deal with the mess, so no point in > > > complaining about it to much :) > > > > > > > Indeed. I wonder if we should just disable it altogether unless CONFIG_X86=y > > The question is whether DMA from a device can instanciate cache lines > in your system. This a system specific rather than architecture > specific question I suspect... > The ARM architecture permits it, afaict, and write-allocate is a hint so the implementation is free to ignore it, whether it is set or cleared. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel