On Mon, 21 Jan 2019 at 20:04, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > > On 2019-01-21 7:28 p.m., Ard Biesheuvel wrote: > > On Mon, 21 Jan 2019 at 19:24, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > >> On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote: > >>> On Mon, 21 Jan 2019 at 19:04, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > >>>> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote: > >>>>> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > >>>>>> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote: > >>>>>>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > >>>>>>> > >>>>>>>> Until that happens we should just change the driver ifdefs to default > >>>>>>>> the hacks to off and only enable them on setups where we 100% > >>>>>>>> positively know that they actually work. And document that fact > >>>>>>>> in big fat comments. > >>>>>>> > >>>>>>> Well, as I mentioned in my commit log as well, if we default to off > >>>>>>> unless CONFIG_X86, we may break working setups on MIPS and Power where > >>>>>>> the device is in fact non-cache coherent, and relies on this > >>>>>>> 'optimization' to get things working. > >>>>>> > >>>>>> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for > >>>>>> correct basic operation (the scenario Christian brought up is a very > >>>>>> specialized use-case), so that shouldn't be an issue. > >>>>> > >>>>> The point is that this is only true for x86. > >>>>> > >>>>> On other architectures, the use of non-cached mappings on the CPU side > >>>>> means that you /do/ rely on non-snooped transfers, since if those > >>>>> transfers turn out not to snoop inadvertently, the accesses are > >>>>> incoherent with the CPU's view of memory. > >>>> > >>>> The driver generally only uses non-cached mappings if > >>>> drm_arch/device_can_wc_memory returns true. > >>> > >>> Indeed. And so we should take care to only return 'true' from that > >>> function if it is guaranteed that non-cached CPU mappings are coherent > >>> with the mappings used by the GPU, either because that is always the > >>> case (like on x86), or because we know that the platform in question > >>> implements NoSnoop correctly throughout the interconnect. > >>> > >>> What seems to be complicating matters is that in some cases, the > >>> device is non-cache coherent to begin with, so regardless of whether > >>> the NoSnoop attribute is used or not, those accesses will not snoop in > >>> the caches and be coherent with the non-cached mappings used by the > >>> CPU. So if we restrict this optimization [on non-X86] to platforms > >>> that are known to implement NoSnoop correctly, we may break platforms > >>> that are implicitly NoSnoop all the time. > >> > >> Since the driver generally doesn't rely on non-snooped accesses for > >> correctness, that couldn't "break" anything that hasn't always been broken. > > > > Again, that is only true on x86. > > > > On other architectures, DMA writes from the device may allocate in the > > caches, and be invisible to the CPU when it uses non-cached mappings. > > Let me try one last time: > > If drm_arch_can_wc_memory returns false, the driver falls back to the > normal mode of operation, using a cacheable CPU mapping and snooped GPU > transfers, even if userspace asks (as a performance optimization) for a > write-combined CPU mapping and non-snooped GPU transfers via > AMDGPU_GEM_CREATE_CPU_GTT_USWC. Another question: when userspace requests for such a mapping to be created, does this involve pages that are mapped cacheable into the userland process? _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx