Re: [PATCH 1/4] mm: introduce cma_alloc_bulk API

Minchan Kim <minchan@xxxxxxxxxx> · Wed, 25 Nov 2020 12:12:58 -0800

On Mon, Nov 23, 2020 at 03:15:37PM +0100, David Hildenbrand wrote:
> On 17.11.20 19:19, Minchan Kim wrote:
> > There is a need for special HW to require bulk allocation of
> > high-order pages. For example, 4800 * order-4 pages, which
> > would be minimum, sometimes, it requires more.
> > 
> > To meet the requirement, a option reserves 300M CMA area and
> > requests the whole 300M contiguous memory. However, it doesn't
> > work if even one of those pages in the range is long-term pinned
> > directly or indirectly. The other option is to ask higher-order
> > size (e.g., 2M) than requested order(64K) repeatedly until driver
> > could gather necessary amount of memory. Basically, this approach
> > makes the allocation very slow due to cma_alloc's function
> > slowness and it could be stuck on one of the pageblocks if it
> > encounters unmigratable page.
> > 
> > To solve the issue, this patch introduces cma_alloc_bulk.
> > 
> > 	int cma_alloc_bulk(struct cma *cma, unsigned int align,
> > 		gfp_t gfp_mask, unsigned int order, size_t nr_requests,
> > 		struct page **page_array, size_t *nr_allocated);
> > 
> > Most parameters are same with cma_alloc but it additionally passes
> > vector array to store allocated memory. What's different with cma_alloc
> > is it will skip pageblocks without waiting/stopping if it has unmovable
> > page so that API continues to scan other pageblocks to find requested
> > order page.
> > 
> > cma_alloc_bulk is best effort approach in that it skips some pageblocks
> > if they have unmovable pages unlike cma_alloc. It doesn't need to be
> > perfect from the beginning at the cost of performance. Thus, the API
> > takes gfp_t to support __GFP_NORETRY which is propagated into
> > alloc_contig_page to avoid significat overhead functions to inrecase
> > CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining
> > per pageblock) at the cost of less allocation success ratio.
> > If the caller couldn't allocate enough pages with __GFP_NORETRY, they
> > could call it without __GFP_NORETRY to increase success ratio this time
> > if they are okay to expense the overhead for the success ratio.
> 
> I'm not a friend of connecting __GFP_NORETRY  to PCP and LRU draining.

I was also not a fan of the gfp flags in the cma funcions since it could
cause misunderstanding easily but saw taling about brining back gfp_t
into cma_alloc. Since It seems to be dropped, let's use other term.

diff --git a/mm/cma.c b/mm/cma.c
index 7c11ec2dc04c..806280050307 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -505,7 +505,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  *
  * @cma:       contiguous memory region for which the allocation is performed.
  * @align:     requested alignment of pages (in PAGE_SIZE order).
- * @gfp_mask:  memory allocation flags
+ * @fast:      will skip costly opeartions if it's true.
  * @order:     requested page order
  * @nr_requests: the number of 2^order pages requested to be allocated as input,
  * @page_array:        page_array pointer to store allocated pages (must have space
@@ -513,10 +513,10 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  * @nr_allocated: the number of 2^order pages allocated as output
  *
  * This function tries to allocate up to @nr_requests @order pages on specific
- * contiguous memory area. If @gfp_mask has __GFP_NORETRY, it will avoid costly
- * functions to increase allocation success ratio so it will be fast but might
- * return less than requested number of pages. User could retry with
- * !__GFP_NORETRY if it is needed.
+ * contiguous memory area. If @fast has true, it will avoid costly functions
+ * to increase allocation success ratio so it will be faster but might return
+ * less than requested number of pages. User could retry it with true if it is
+ * needed.
  *
  * Return: it will return 0 only if all pages requested by @nr_requestsed are
  * allocated. Otherwise, it returns negative error code.
@@ -525,7 +525,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  * how many @order pages are allocated and free those pages when they are not
  * needed.
  */
-int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
+int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast,
                        unsigned int order, size_t nr_requests,
                        struct page **page_array, size_t *nr_allocated)
 {
@@ -538,8 +538,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
        unsigned long start = 0;
        unsigned long bitmap_maxno, bitmap_no, bitmap_count;
        struct page *page = NULL;
-       gfp_t gfp = GFP_KERNEL|__GFP_NOWARN|gfp_mask;
-
+       enum alloc_contig_mode mode = fast ? ALLOC_CONTIG_FAST :
+                                               ALLOC_CONTIG_NORMAL;
        *nr_allocated = 0;
        if (!cma || !cma->count || !cma->bitmap || !page_array)
                return -EINVAL;
@@ -576,7 +576,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
                mutex_unlock(&cma->lock);

                pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
-               ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, gfp);
+               ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA,
+                                               GFP_KERNEL|__GFP_NOWARN, mode);
                if (ret) {
                        cma_clear_bitmap(cma, pfn, nr_pages);
                        if (ret != -EBUSY)



> Also, gfp flags apply mostly to compaction (e.g., how to allocate free
> pages for migration), so this seems a little wrong.
> 
> Can we instead introduce
> 
> enum alloc_contig_mode {
> 	/*
> 	 * Normal mode:
> 	 *
> 	 * Retry page migration 5 times, ... TBD
> 	 *
> 	 */
> 	ALLOC_CONTIG_NORMAL = 0,
> 	/*
> 	 * Fast mode: e.g., used for bulk allocations.
>          *
> 	 * Don't retry page migration if it fails, don't drain PCP
>          * lists, don't drain LRU.
> 	 */
> 	ALLOC_CONTIG_FAST,
> };

Yeah, the mode is better. Let's have it as preparation patch.