Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA

Ge Yang <yangge1116@xxxxxxx> · Sat, 8 Feb 2025 16:50:46 +0800

在 2025/1/28 17:58, Barry Song 写道:
On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@xxxxxxx> wrote:

From: yangge <yangge1116@xxxxxxx>

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
simply reverts to the original method of using the cma_mutex to ensure
that alloc_contig_range() runs sequentially. This change was made to avoid
concurrency allocation failures. However, it can negatively impact
performance when concurrent allocation of CMA memory is required.

Do we have some data?
Yes, I will add it in the next version, thanks.


To address this issue, we could introduce an API for concurrency settings,
allowing users to decide whether their CMA can perform concurrent memory
allocations or not.

Who is the intended user of cma_set_concurrency? 
We have some drivers that use cma_set_concurrency(), but they have not 
yet been merged into the mainline. The cma_alloc_mem() function in the 
mainline also supports concurrent allocation of CMA memory. By applying 
this patch, we can also achieve significant performance improvements in 
certain scenarios. I will provide performance data in the next version.
I also feel it is somewhat
unsafe since cma->concurr_alloc is not protected by any locks.
Ok, thanks.

Will a user setting cma->concurr_alloc = 1 encounter the original issue that
commit 60a60e32cf91 was attempting to fix?

Yes, if a user encounters the issue described in commit 60a60e32cf91, 
they will not be able to set cma->concurr_alloc to 1.

Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
Signed-off-by: yangge <yangge1116@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
---
  include/linux/cma.h |  2 ++
  mm/cma.c            | 22 ++++++++++++++++++++--
  mm/cma.h            |  1 +
  3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index d15b64f..2384624 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

  extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
+
  #ifdef CONFIG_CMA
  struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
  bool cma_free_folio(struct cma *cma, const struct folio *folio);
diff --git a/mm/cma.c b/mm/cma.c
index de5bc0c..49a7186 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
                 spin_unlock_irq(&cma->lock);

                 pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
-               mutex_lock(&cma_mutex);
+
+               /*
+                * If the user sets the concurr_alloc of CMA to true, concurrent
+                * memory allocation is allowed. If the user sets it to false or
+                * does not set it, concurrent memory allocation is not allowed.
+                */
+               if (!cma->concurr_alloc)
+                       mutex_lock(&cma_mutex);
                 ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
-               mutex_unlock(&cma_mutex);
+               if (!cma->concurr_alloc)
+                       mutex_unlock(&cma_mutex);
                 if (ret == 0) {
                         page = pfn_to_page(pfn);
                         break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

         return 0;
  }
+
+bool cma_set_concurrency(struct cma *cma, bool concurrency)
+{
+       if (!cma)
+               return false;
+
+       cma->concurr_alloc = concurrency;
+
+       return true;
+}
diff --git a/mm/cma.h b/mm/cma.h
index 8485ef8..30f489d 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -16,6 +16,7 @@ struct cma {
         unsigned long   *bitmap;
         unsigned int order_per_bit; /* Order of pages represented by one bit */
         spinlock_t      lock;
+       bool concurr_alloc;
  #ifdef CONFIG_CMA_DEBUGFS
         struct hlist_head mem_head;
         spinlock_t mem_head_lock;
--
2.7.4



Thanks
Barry