Am Montag, den 19.05.2014, 11:02 +0200 schrieb Thierry Reding: > On Mon, May 19, 2014 at 04:10:58PM +0900, Alexandre Courbot wrote: > > Some architectures (e.g. ARM) need the CPU buffers to be explicitely > > flushed for a memory write to take effect. Not doing so results in > > synchronization issues, especially after writing to BOs. > > It seems to me that the above is generally true for all architectures, > not just ARM. > No, on PCI coherent arches, like x86 and some PowerPCs, the GPU will snoop the CPU caches and therefore an explicit cache flush is not required. > Also: s/explicitely/explicitly/ > > > This patch introduces a macro that flushes the caches on ARM and > > translates to a no-op on other architectures, and uses it when > > writing to in-memory BOs. It will also be useful for implementations of > > instmem that access shared memory directly instead of going through > > PRAMIN. > > Presumably instmem can access shared memory on all architectures, so > this doesn't seem like a property of the architecture but rather of the > memory pool backing the instmem. > > In that case I wonder if this shouldn't be moved into an operation that > is implemented by the backing memory pool and be a noop where the cache > doesn't need explicit flushing. > > > diff --git a/drivers/gpu/drm/nouveau/core/os.h b/drivers/gpu/drm/nouveau/core/os.h > > index d0ced94ca54c..274b4460bb03 100644 > > --- a/drivers/gpu/drm/nouveau/core/os.h > > +++ b/drivers/gpu/drm/nouveau/core/os.h > > @@ -38,4 +38,21 @@ > > #endif /* def __BIG_ENDIAN else */ > > #endif /* !ioread32_native */ > > > > +#if defined(__arm__) > > + > > +#define nv_cpu_cache_flush_area(va, size) \ > > +do { \ > > + phys_addr_t pa = virt_to_phys(va); \ > > + __cpuc_flush_dcache_area(va, size); \ > > + outer_flush_range(pa, pa + size); \ > > +} while (0) > > Couldn't this be a static inline function? > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c > [...] > > index 0886f47e5244..b9c9729c5733 100644 > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > > @@ -437,8 +437,10 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16 val) > > mem = &mem[index]; > > if (is_iomem) > > iowrite16_native(val, (void __force __iomem *)mem); > > - else > > + else { > > *mem = val; > > + nv_cpu_cache_flush_area(mem, 2); > > + } > > } > > > > u32 > > @@ -461,8 +463,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) > > mem = &mem[index]; > > if (is_iomem) > > iowrite32_native(val, (void __force __iomem *)mem); > > - else > > + else { > > *mem = val; > > + nv_cpu_cache_flush_area(mem, 4); > > + } > > This looks rather like a sledgehammer to me. Effectively this turns nvbo > into an uncached buffer. With additional overhead of constantly flushing > caches. Wouldn't it make more sense to locate the places where these are > called and flush the cache after all the writes have completed? > I don't think the explicit flushing for those things makes sense. I think it is a lot more effective to just map the BOs write-combined on PCI non-coherent arches. This way any writes will be buffered. Reads will be slow, but I don't think nouveau is reading back a lot from those buffers. Using the write-combining buffer doesn't need any additional synchronization as it will get flushed on pushbuf kickoff anyways. Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel