Re: [PATCH v7 1/2] ARM: dma-mapping: Don't use outer_flush_range when the L2C is coherent

Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> · Thu, 7 Apr 2016 23:09:14 +0100

On Thu, Apr 07, 2016 at 02:33:34PM -0700, Gregory CLEMENT wrote:
> When a L2 cache controller is used in a system that provides hardware
> coherency, the entire outer cache operations are useless, and can be
> skipped.  Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
> 
> In the current kernel implementation, the outer cache flush range
> operation is triggered by the dma_alloc function.
> This operation can be take place during runtime and in some
> circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
> SoCs.
> 
> This patch extends the __dma_clear_buffer() function to receive a
> boolean argument related to the coherency of the system. The same
> things is done for the calling functions.
> 
> Reported-by: Nadav Haklai <nadavh@xxxxxxxxxxx>
> Signed-off-by: Gregory CLEMENT <gregory.clement@xxxxxxxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # v3.16+
> Tested-by: Marcin Wojtas <mw@xxxxxxxxxxxx>
> ---
>  arch/arm/mm/dma-mapping.c | 63 +++++++++++++++++++++++++++++------------------
>  1 file changed, 39 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index deac58d5f1f7..0231ed295bb2 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -61,7 +61,7 @@ struct arm_dma_free_args {
>  
>  struct arm_dma_allocator {
>  	void *(*alloc)(struct arm_dma_alloc_args *args,
> -		       struct page **ret_page);
> +		       struct page **ret_page, bool l2_coherent);
>  	void (*free)(struct arm_dma_free_args *args);
>  };
>  
> @@ -274,7 +274,7 @@ static u64 get_coherent_dma_mask(struct device *dev)
>  	return mask;
>  }
>  
> -static void __dma_clear_buffer(struct page *page, size_t size)
> +static void __dma_clear_buffer(struct page *page, size_t size, bool l2_coherent)
>  {
>  	/*
>  	 * Ensure that the allocated pages are zeroed, and that any data
> @@ -291,12 +291,14 @@ static void __dma_clear_buffer(struct page *page, size_t size)
>  			page++;
>  			size -= PAGE_SIZE;
>  		}
> -		outer_flush_range(base, end);
> +		if (!l2_coherent)
> +			outer_flush_range(base, end);
>  	} else {
>  		void *ptr = page_address(page);
>  		memset(ptr, 0, size);
>  		dmac_flush_range(ptr, ptr + size);
> -		outer_flush_range(__pa(ptr), __pa(ptr) + size);
> +		if (!l2_coherent)
> +			outer_flush_range(__pa(ptr), __pa(ptr) + size);

It is appropriate here to use l2_coherent because you're _only_ disabling
the L2 cache flushes, while leaving the L1 cache flushes in place.

If you have a fully coherent architecture, then the L1 cache flushes
aren't required either.

>  	}
>  }
>  
> @@ -304,7 +306,8 @@ static void __dma_clear_buffer(struct page *page, size_t size)
>   * Allocate a DMA buffer for 'dev' of size 'size' using the
>   * specified gfp mask.  Note that 'size' must be page aligned.
>   */
> -static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp)
> +static struct page *__dma_alloc_buffer(struct device *dev, size_t size,
> +				       gfp_t gfp, bool l2_coherent)
>  {
>  	unsigned long order = get_order(size);
>  	struct page *page, *p, *e;
> @@ -320,7 +323,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf
>  	for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++)
>  		__free_page(p);
>  
> -	__dma_clear_buffer(page, size);
> +	__dma_clear_buffer(page, size, l2_coherent);

So, this is also logical here, and with this in mind the rest of the
patch looks fine.  However, when I look at patch 2, I start to see
problems with this naming, because in patch 2, we start doing things
which assume that L1 is also coherent.

What this means is that the original "is_coherent" for just patch 1
was wrong, but possibly more correct when patch 2 is included, but
using "l2_coherent" is wrong for the opposite reasons.

Moreover, we end up creating what seems to be something of a mess -
when this flag is set, we end up with some allocators treating this
flag as "only L2 coherent" and others treating it as "both L1 and L2
coherent", which is really insane.  For this reason, I really don't
like these patches - this code is already complex enough that we
don't need to be making things more confusing through this.

The other thing I'm really not keen on is seeing functions which
take multiple bool arguments.  Consider:

	foo(blah, true, true, false);

is meaningless unless you have the prototype stuck in your head.
Using a set of flags which can be or'd together (eg, like the gfp
stuff) is much more preferable as it gives a descriptive nature
to what's going on.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html