On 2019-01-21 7:28 p.m., Ard Biesheuvel wrote: > On Mon, 21 Jan 2019 at 19:24, Michel Dänzer <michel@xxxxxxxxxxx> wrote: >> On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote: >>> On Mon, 21 Jan 2019 at 19:04, Michel Dänzer <michel@xxxxxxxxxxx> wrote: >>>> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote: >>>>> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer <michel@xxxxxxxxxxx> wrote: >>>>>> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote: >>>>>>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>>> Until that happens we should just change the driver ifdefs to default >>>>>>>> the hacks to off and only enable them on setups where we 100% >>>>>>>> positively know that they actually work. And document that fact >>>>>>>> in big fat comments. >>>>>>> >>>>>>> Well, as I mentioned in my commit log as well, if we default to off >>>>>>> unless CONFIG_X86, we may break working setups on MIPS and Power where >>>>>>> the device is in fact non-cache coherent, and relies on this >>>>>>> 'optimization' to get things working. >>>>>> >>>>>> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for >>>>>> correct basic operation (the scenario Christian brought up is a very >>>>>> specialized use-case), so that shouldn't be an issue. >>>>> >>>>> The point is that this is only true for x86. >>>>> >>>>> On other architectures, the use of non-cached mappings on the CPU side >>>>> means that you /do/ rely on non-snooped transfers, since if those >>>>> transfers turn out not to snoop inadvertently, the accesses are >>>>> incoherent with the CPU's view of memory. >>>> >>>> The driver generally only uses non-cached mappings if >>>> drm_arch/device_can_wc_memory returns true. >>> >>> Indeed. And so we should take care to only return 'true' from that >>> function if it is guaranteed that non-cached CPU mappings are coherent >>> with the mappings used by the GPU, either because that is always the >>> case (like on x86), or because we know that the platform in question >>> implements NoSnoop correctly throughout the interconnect. >>> >>> What seems to be complicating matters is that in some cases, the >>> device is non-cache coherent to begin with, so regardless of whether >>> the NoSnoop attribute is used or not, those accesses will not snoop in >>> the caches and be coherent with the non-cached mappings used by the >>> CPU. So if we restrict this optimization [on non-X86] to platforms >>> that are known to implement NoSnoop correctly, we may break platforms >>> that are implicitly NoSnoop all the time. >> >> Since the driver generally doesn't rely on non-snooped accesses for >> correctness, that couldn't "break" anything that hasn't always been broken. > > Again, that is only true on x86. > > On other architectures, DMA writes from the device may allocate in the > caches, and be invisible to the CPU when it uses non-cached mappings. Let me try one last time: If drm_arch_can_wc_memory returns false, the driver falls back to the normal mode of operation, using a cacheable CPU mapping and snooped GPU transfers, even if userspace asks (as a performance optimization) for a write-combined CPU mapping and non-snooped GPU transfers via AMDGPU_GEM_CREATE_CPU_GTT_USWC. This normal mode of operation is also used for the ring buffers at the heart of the driver's operation. If there is a platform where this normal mode of operation doesn't work, the driver could never have worked reliably on that platform, since before AMDGPU_GEM_CREATE_CPU_GTT_USWC or drm_arch_can_wc_memory even existed. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel