On Thu, Apr 07, 2016 at 02:33:34PM -0700, Gregory CLEMENT wrote: > When a L2 cache controller is used in a system that provides hardware > coherency, the entire outer cache operations are useless, and can be > skipped. Moreover, on some systems, it is harmful as it causes > deadlocks between the Marvell coherency mechanism, the Marvell PCIe > controller and the Cortex-A9. > > In the current kernel implementation, the outer cache flush range > operation is triggered by the dma_alloc function. > This operation can be take place during runtime and in some > circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x > SoCs. > > This patch extends the __dma_clear_buffer() function to receive a > boolean argument related to the coherency of the system. The same > things is done for the calling functions. > > Reported-by: Nadav Haklai <nadavh@xxxxxxxxxxx> > Signed-off-by: Gregory CLEMENT <gregory.clement@xxxxxxxxxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v3.16+ > Tested-by: Marcin Wojtas <mw@xxxxxxxxxxxx> > --- > arch/arm/mm/dma-mapping.c | 63 +++++++++++++++++++++++++++++------------------ > 1 file changed, 39 insertions(+), 24 deletions(-) > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index deac58d5f1f7..0231ed295bb2 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -61,7 +61,7 @@ struct arm_dma_free_args { > > struct arm_dma_allocator { > void *(*alloc)(struct arm_dma_alloc_args *args, > - struct page **ret_page); > + struct page **ret_page, bool l2_coherent); > void (*free)(struct arm_dma_free_args *args); > }; > > @@ -274,7 +274,7 @@ static u64 get_coherent_dma_mask(struct device *dev) > return mask; > } > > -static void __dma_clear_buffer(struct page *page, size_t size) > +static void __dma_clear_buffer(struct page *page, size_t size, bool l2_coherent) > { > /* > * Ensure that the allocated pages are zeroed, and that any data > @@ -291,12 +291,14 @@ static void __dma_clear_buffer(struct page *page, size_t size) > page++; > size -= PAGE_SIZE; > } > - outer_flush_range(base, end); > + if (!l2_coherent) > + outer_flush_range(base, end); > } else { > void *ptr = page_address(page); > memset(ptr, 0, size); > dmac_flush_range(ptr, ptr + size); > - outer_flush_range(__pa(ptr), __pa(ptr) + size); > + if (!l2_coherent) > + outer_flush_range(__pa(ptr), __pa(ptr) + size); It is appropriate here to use l2_coherent because you're _only_ disabling the L2 cache flushes, while leaving the L1 cache flushes in place. If you have a fully coherent architecture, then the L1 cache flushes aren't required either. > } > } > > @@ -304,7 +306,8 @@ static void __dma_clear_buffer(struct page *page, size_t size) > * Allocate a DMA buffer for 'dev' of size 'size' using the > * specified gfp mask. Note that 'size' must be page aligned. > */ > -static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) > +static struct page *__dma_alloc_buffer(struct device *dev, size_t size, > + gfp_t gfp, bool l2_coherent) > { > unsigned long order = get_order(size); > struct page *page, *p, *e; > @@ -320,7 +323,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf > for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) > __free_page(p); > > - __dma_clear_buffer(page, size); > + __dma_clear_buffer(page, size, l2_coherent); So, this is also logical here, and with this in mind the rest of the patch looks fine. However, when I look at patch 2, I start to see problems with this naming, because in patch 2, we start doing things which assume that L1 is also coherent. What this means is that the original "is_coherent" for just patch 1 was wrong, but possibly more correct when patch 2 is included, but using "l2_coherent" is wrong for the opposite reasons. Moreover, we end up creating what seems to be something of a mess - when this flag is set, we end up with some allocators treating this flag as "only L2 coherent" and others treating it as "both L1 and L2 coherent", which is really insane. For this reason, I really don't like these patches - this code is already complex enough that we don't need to be making things more confusing through this. The other thing I'm really not keen on is seeing functions which take multiple bool arguments. Consider: foo(blah, true, true, false); is meaningless unless you have the prototype stuck in your head. Using a set of flags which can be or'd together (eg, like the gfp stuff) is much more preferable as it gives a descriptive nature to what's going on. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html