[PATCH v3 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On an ARMv7 platform with 1G memory, when MAX_ORDER_NR_PAGES
is big (e.g. 32M with max_order 14) and CMA memory is relatively small
(e.g. 128M), we observed a huge number of repeat retries of CMA
allocation (1k+) during booting when allocating one page for each
of 3 mmc instance probe.

This is caused by CMA now supports concurrent allocation since commit
a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
The memory range, MAX_ORDER_NR_PAGES aligned, from which we are trying
to allocate memory may have already been acquired and isolated by others
(see: alloc_contig_range()).

Current cma_alloc() will retry the next area by a small step of
bitmap_no + mask + 1 which are very likely within the same isolated range
and fail again. So when the MAX_ORDER_NR_PAGES is big (e.g. 32M),
keep retrying in a small step become meaningless because it will be known
to fail at a huge number of times due to that memory range has been isolated
by others, especially when allocating only one or two pages.

Instead of looping in the same isolated range and wasting CPU mips a lot,
especially for big MAX_ORDER systems (e.g. 16M or 32M),
we try the next MAX_ORDER_NR_PAGES directly.

Doing this way can greatly mitigate the situtation.

Below is the original error log during booting:
[    2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
.... (+1K retries)

After fix, the 1200+ reties can be reduced to 0.
Another test running 8 threads running dma_alloc_coherent() in parallel
shows that 1500+ retries dropped to ~145.

IOW this patch can improve the CMA allocation speed a lot when there're
enough CMA memory by reducing retries significantly.

Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Cc: Lecopzer Chen <lecopzer.chen@xxxxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
CC: stable@xxxxxxxxxxxxxxx # 5.11+
Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
Signed-off-by: Dong Aisheng <aisheng.dong@xxxxxxx>
---
v2->v3:
 * Improve commit messeages
v1->v2:
 * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages
---
 mm/cma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 46a9fd9f92c4..46bc12fe28b3 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -496,8 +496,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 
 		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
 					   count, align);
-		/* try again with a bit different memory target */
-		start = bitmap_no + mask + 1;
+		/*
+		 * Try again with a bit different memory target.
+		 * Since memory isolated in alloc_contig_range() is aligned
+		 * with MAX_ORDER_NR_PAGES, instead of retrying in a small
+		 * step within the same isolated range, we try the next
+		 * available memory range directly.
+		 */
+		start = ALIGN(bitmap_no + mask + 1,
+			      MAX_ORDER_NR_PAGES >> cma->order_per_bit);
+
 	}
 
 	trace_cma_alloc_finish(cma->name, pfn, page, count, align);
-- 
2.25.1




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux