Re: [PATCH v3 2/3] dma-mapping: replace zone_dma_bits by zone_dma_limit

Catalin Marinas <catalin.marinas@xxxxxxx> · Wed, 31 Jul 2024 18:26:20 +0100

On Mon, Jul 29, 2024 at 01:51:25PM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@xxxxxxx>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use direct phys_addr_t limit address for DMA zone limit.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx>
> Signed-off-by: Baruch Siach <baruch@xxxxxxxxxx>

You should add your Co-developed-by line, the patch evolved a bit from
initial my partial diff.

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..870fd967c610 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -114,36 +114,28 @@ static void __init arch_reserve_crashkernel(void)
>  				    low_size, high);
>  }
>  
> -/*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> - * available memory, otherwise cap it at 32-bit.
> - */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>  {
> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
> -	phys_addr_t phys_start = memblock_start_of_DRAM();
> -
> -	if (phys_start > U32_MAX)
> -		zone_mask = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_mask)
> -		zone_mask = U32_MAX;
> +	/* We have RAM in low 32-bit area, keep DMA zone there */
> +	if (memblock_start_of_DRAM() < U32_MAX)
> +		zone_limit = min(U32_MAX, zone_limit);

Does this matter anymore? Or is it to keep ZONE_DMA below (or equal to)
the ZONE_DMA32 limit?

Anyway, since this patch is about replacing zone_dma_bits with
zone_dma_limit, we should not introduce functional changes. AFAICT, we
need zone_limit to be set to memblock_end_of_DRAM() when phys_start is
above U32_MAX. You can do the functional change in a subsequent patch
once all the other refactoring has been handled.

>  
> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>  }
[...]
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index edbe13d00776..98b7d8015043 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -12,7 +12,7 @@
>  #include <linux/mem_encrypt.h>
>  #include <linux/swiotlb.h>
>  
> -extern unsigned int zone_dma_bits;
> +extern phys_addr_t zone_dma_limit;
>  
>  /*
>   * Record the mapping of CPU physical to DMA addresses for a given region.
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 3b4be4ca3b08..3dbc0b89d6fb 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
>   * it for entirely different regions. In that case the arch code needs to
>   * override the variable below for dma-direct to work properly.
>   */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>  
>  static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>  		phys_addr_t phys)
> @@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  	 * part of the check.
>  	 */
>  	if (IS_ENABLED(CONFIG_ZONE_DMA))
> -		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
> +		min_mask = min_t(u64, min_mask, zone_dma_limit);
>  	return mask >= phys_to_dma_unencrypted(dev, min_mask);
>  }
>  
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index d10613eb0f63..410a7b40e496 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -70,7 +70,7 @@ static bool cma_in_zone(gfp_t gfp)
>  	/* CMA can't cross zone boundaries, see cma_activate_area() */
>  	end = cma_get_base(cma) + size - 1;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> -		return end <= DMA_BIT_MASK(zone_dma_bits);
> +		return end <= zone_dma_limit;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>  		return end <= DMA_BIT_MASK(32);
>  	return true;

I haven't got to the third patch yet but with this series we can have
zone_dma_limit above DMA_BIT_MASK(32). The above function could return
false for GFP_DMA32 when 'end' is perfectly valid within ZONE_DMA (which
implies safe for GFP_DMA32).

> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index df68d29740a0..dfd83e5ee0b3 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -450,7 +450,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  	if (!remap)
>  		io_tlb_default_mem.can_grow = true;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
>  		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
>  	else
> @@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>  	}
>  
>  	gfp &= ~GFP_ZONEMASK;
> -	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (phys_limit <= zone_dma_limit)
>  		gfp |= __GFP_DMA;
>  	else if (phys_limit <= DMA_BIT_MASK(32))
>  		gfp |= __GFP_DMA32;

I think this has the same issue as dma_direct_optimal_gfp_mask(). If
the requested limit is strictly smaller than DMA_BIT_MASK(32) (but
potentially bigger than zone_dma_limit), we should go for __GFP_DMA
rather than __GFP_DMA32. You should probably fix this in the first patch
as well.

As above, with this series we can end up with zone_dma_limit above
DMA_BIT_MASK(32) and all these checks become confusing. Even the
swiotlb_init_late() above if called with GFP_DMA32 will set a phys_limit
that may not be accessible at all if the DRAM starts above 4GB.
Similarly, cma_in_zone() could return false with GFP_DMA32 in similar
hardware configurations.

So we either introduce a zone_dma32_limit variable and allow a 32-bit
range above the start of DRAM or sanitise these sites to make sure
passing GFP_DMA32 is safe - i.e. assume we only have ZONE_DMA if
zone_dma_limit is above 32-bit. I prefer the latter without introducing
a zone_dma32_limit.

-- 
Catalin