On Sat, Feb 8, 2025 at 9:50 PM Ge Yang <yangge1116@xxxxxxx> wrote: > > > > 在 2025/1/28 17:58, Barry Song 写道: > > On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@xxxxxxx> wrote: > >> > >> From: yangge <yangge1116@xxxxxxx> > >> > >> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") > >> simply reverts to the original method of using the cma_mutex to ensure > >> that alloc_contig_range() runs sequentially. This change was made to avoid > >> concurrency allocation failures. However, it can negatively impact > >> performance when concurrent allocation of CMA memory is required. > > > > Do we have some data? > Yes, I will add it in the next version, thanks. > > > >> > >> To address this issue, we could introduce an API for concurrency settings, > >> allowing users to decide whether their CMA can perform concurrent memory > >> allocations or not. > > > > Who is the intended user of cma_set_concurrency? > We have some drivers that use cma_set_concurrency(), but they have not > yet been merged into the mainline. The cma_alloc_mem() function in the > mainline also supports concurrent allocation of CMA memory. By applying > this patch, we can also achieve significant performance improvements in > certain scenarios. I will provide performance data in the next version. > I also feel it is somewhat > > unsafe since cma->concurr_alloc is not protected by any locks. > Ok, thanks. > > > > Will a user setting cma->concurr_alloc = 1 encounter the original issue that > > commit 60a60e32cf91 was attempting to fix? > > > Yes, if a user encounters the issue described in commit 60a60e32cf91, > they will not be able to set cma->concurr_alloc to 1. A user who hasn't encountered a problem yet doesn't mean they won't encounter it; it most likely just means the testing time hasn't been long enough. Is it possible to implement a per-CMA lock or range lock that simultaneously improves performance and prevents the original issue that commit 60a60e32cf91 aimed to fix? I strongly believe that cma->concurr_alloc is not the right approach. Let's not waste our time on this kind of hack or workaround. Instead, we should find a proper fix that remains transparent to users. > >> > >> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") > >> Signed-off-by: yangge <yangge1116@xxxxxxx> > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> --- > >> include/linux/cma.h | 2 ++ > >> mm/cma.c | 22 ++++++++++++++++++++-- > >> mm/cma.h | 1 + > >> 3 files changed, 23 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/cma.h b/include/linux/cma.h > >> index d15b64f..2384624 100644 > >> --- a/include/linux/cma.h > >> +++ b/include/linux/cma.h > >> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) > >> > >> extern void cma_reserve_pages_on_error(struct cma *cma); > >> > >> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency); > >> + > >> #ifdef CONFIG_CMA > >> struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); > >> bool cma_free_folio(struct cma *cma, const struct folio *folio); > >> diff --git a/mm/cma.c b/mm/cma.c > >> index de5bc0c..49a7186 100644 > >> --- a/mm/cma.c > >> +++ b/mm/cma.c > >> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, > >> spin_unlock_irq(&cma->lock); > >> > >> pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); > >> - mutex_lock(&cma_mutex); > >> + > >> + /* > >> + * If the user sets the concurr_alloc of CMA to true, concurrent > >> + * memory allocation is allowed. If the user sets it to false or > >> + * does not set it, concurrent memory allocation is not allowed. > >> + */ > >> + if (!cma->concurr_alloc) > >> + mutex_lock(&cma_mutex); > >> ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp); > >> - mutex_unlock(&cma_mutex); > >> + if (!cma->concurr_alloc) > >> + mutex_unlock(&cma_mutex); > >> if (ret == 0) { > >> page = pfn_to_page(pfn); > >> break; > >> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) > >> > >> return 0; > >> } > >> + > >> +bool cma_set_concurrency(struct cma *cma, bool concurrency) > >> +{ > >> + if (!cma) > >> + return false; > >> + > >> + cma->concurr_alloc = concurrency; > >> + > >> + return true; > >> +} > >> diff --git a/mm/cma.h b/mm/cma.h > >> index 8485ef8..30f489d 100644 > >> --- a/mm/cma.h > >> +++ b/mm/cma.h > >> @@ -16,6 +16,7 @@ struct cma { > >> unsigned long *bitmap; > >> unsigned int order_per_bit; /* Order of pages represented by one bit */ > >> spinlock_t lock; > >> + bool concurr_alloc; > >> #ifdef CONFIG_CMA_DEBUGFS > >> struct hlist_head mem_head; > >> spinlock_t mem_head_lock; > >> -- > >> 2.7.4 > >> > >> > > Thanks Barry