Re: [PATCH v2 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry

David Hildenbrand <david@xxxxxxxxxx> · Tue, 25 Jan 2022 17:33:49 +0100

On 12.01.22 14:15, Dong Aisheng wrote:
> On an ARMv7 platform with 32M pageblock(MAX_ORDER 14), we observed a

Did you actually intend to talk about pageblocks here (and below)?

I assume you have to be clearer here that you talk about the maximum
allocation granularity, which is usually bigger than actual pageblock size.

> huge number of repeat retries of CMA allocation (1k+) during booting
> when allocating one page for each of 3 mmc instance probe.
> 
> This is caused by CMA now supports cocurrent allocation since commit
> a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
> The pageblock or (MAX_ORDER -1) from which we are trying to allocate
> memory may have already been acquired and isolated by others.
> Current cma_alloc() will then retry the next area by the step of
> bitmap_no + mask + 1 which are very likely within the same isolated range
> and fail again. So when the pageblock or MAX_ORDER is big (e.g. 8192),
> keep retrying in a small step become meaningless because it will be known
> to fail at a huge number of times due to the pageblock has been isolated
> by others, especially when allocating only one or two pages.
> 
> Instread of looping in the same pageblock and wasting CPU mips a lot,
> especially for big pageblock system (e.g. 16M or 32M),
> we try the next MAX_ORDER_NR_PAGES directly.
> 
> Doing this way can greatly mitigate the situtation.
> 
> Below is the original error log during booting:
> [    2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [    2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [    2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> .... (+1K retries)
> 
> After fix, the 1200+ reties can be reduced to 0.
> Another test running 8 VPU decoder in parallel shows that 1500+ retries
> dropped to ~145.
> 
> IOW this patch can improve the CMA allocation speed a lot when there're
> enough CMA memory by reducing retries significantly.
> 
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Cc: Lecopzer Chen <lecopzer.chen@xxxxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxx>
> CC: stable@xxxxxxxxxxxxxxx # 5.11+
> Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
> Signed-off-by: Dong Aisheng <aisheng.dong@xxxxxxx>
> ---
> v1->v2:
>  * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages
> ---
>  mm/cma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 1c13a729d274..1251f65e2364 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -500,7 +500,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>  		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
>  					   count, align);
>  		/* try again with a bit different memory target */
> -		start = bitmap_no + mask + 1;
> +		start = ALIGN(bitmap_no + mask + 1,
> +			      MAX_ORDER_NR_PAGES >> cma->order_per_bit);

Mind giving the reader a hint in the code why we went for
MAX_ORDER_NR_PAGES?

What would happen if the CMA granularity is bigger than
MAX_ORDER_NR_PAGES? I'd assume no harm done, as we'd try aligning to 0.

-- 
Thanks,

David / dhildenb