On Wed, Dec 02, 2020 at 10:14:41AM +0100, David Hildenbrand wrote: > On 01.12.20 18:51, Minchan Kim wrote: > > There is a need for special HW to require bulk allocation of > > high-order pages. For example, 4800 * order-4 pages, which > > would be minimum, sometimes, it requires more. > > > > To meet the requirement, a option reserves 300M CMA area and > > requests the whole 300M contiguous memory. However, it doesn't > > work if even one of those pages in the range is long-term pinned > > directly or indirectly. The other option is to ask higher-order > > My latest knowledge is that pages in the CMA area are never long term > pinned. > > https://lore.kernel.org/lkml/20201123090129.GD27488@xxxxxxxxxxxxxx/ > > "gup already tries to deal with long term pins on CMA regions and migrate > to a non CMA region. Have a look at __gup_longterm_locked." > > We should rather identify ways how that is still possible and get rid of > them. > > > Now, short-term pinnings and PCP are other issues where > alloc_contig_range() could be improved (e.g., in contrast to a FAST > mode, a HARD mode which temporarily disables the PCP, ...). GUP is just one of cases to make CMA failure directly but there are still bunch of cases indirectly - make short-term pin to long-term by several reasons (CPU scheduling/IO scheduling/memory pressure/locking dependency) In the end, they would be finally released but is much unreliable/slow so idea just skip the page and continue to search the other pages in CMA area because what we needed is just severl high-order pages, not that big 300M chunk. > > > size (e.g., 2M) than requested order(64K) repeatedly until driver > > could gather necessary amount of memory. Basically, this approach > > makes the allocation very slow due to cma_alloc's function > > slowness and it could be stuck on one of the pageblocks if it > > encounters unmigratable page. > > > > To solve the issue, this patch introduces cma_alloc_bulk. > > > > int cma_alloc_bulk(struct cma *cma, unsigned int align, > > bool fast, unsigned int order, size_t nr_requests, > > struct page **page_array, size_t *nr_allocated); > > > > Most parameters are same with cma_alloc but it additionally passes > > vector array to store allocated memory. What's different with cma_alloc > > is it will skip pageblocks without waiting/stopping if it has unmovable > > page so that API continues to scan other pageblocks to find requested > > order page. > > > > cma_alloc_bulk is best effort approach in that it skips some pageblocks > > if they have unmovable pages unlike cma_alloc. It doesn't need to be > > perfect from the beginning at the cost of performance. Thus, the API > > takes "bool fast parameter" which is propagated into alloc_contig_range to > > avoid significat overhead functions to inrecase CMA allocation success > > ratio(e.g., migration retrial, PCP, LRU draining per pageblock) > > at the cost of less allocation success ratio. If the caller couldn't > > allocate enough, they could call it with "false" to increase success ratio > > if they are okay to expense the overhead for the success ratio. > > Just so I understand what the idea is: > > alloc_contig_range() sometimes fails on CMA regions when trying to > allocate big chunks (e.g., 300M). Instead of tackling that issue, you > rather allocate plenty of small chunks, and make these small allocations > fail faster/ make the allocations less reliable. Correct? Please look at above. > > I don't really have a strong opinion on that. Giving up fast rather than > trying for longer sounds like a useful thing to have - but I wonder if > it's strictly necessary for the use case you describe. > > I'd like to hear Michals opinion on that. > > > > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > > --- > > include/linux/cma.h | 5 ++ > > include/linux/gfp.h | 2 + > > mm/cma.c | 126 ++++++++++++++++++++++++++++++++++++++++++-- > > mm/page_alloc.c | 19 ++++--- > > 4 files changed, 140 insertions(+), 12 deletions(-) > > > > diff --git a/include/linux/cma.h b/include/linux/cma.h > > index 217999c8a762..7375d3131804 100644 > > --- a/include/linux/cma.h > > +++ b/include/linux/cma.h > > @@ -46,6 +46,11 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, > > struct cma **res_cma); > > extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > > bool no_warn); > > + > > +extern int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast, > > + unsigned int order, size_t nr_requests, > > + struct page **page_array, size_t *nr_allocated); > > + > > extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); > > > > extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index ad5872699692..75bfb673d75b 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -627,6 +627,8 @@ static inline bool pm_suspended_storage(void) > > enum alloc_contig_mode { > > /* try several ways to increase success ratio of memory allocation */ > > ALLOC_CONTIG_NORMAL, > > + /* avoid costly functions to make the call fast */ > > + ALLOC_CONTIG_FAST, > > }; > > > > /* The below functions must be run on a range from a single zone. */ > > diff --git a/mm/cma.c b/mm/cma.c > > index 8010c1ba04b0..4459045fa717 100644 > > --- a/mm/cma.c > > +++ b/mm/cma.c > > @@ -32,6 +32,7 @@ > > #include <linux/highmem.h> > > #include <linux/io.h> > > #include <linux/kmemleak.h> > > +#include <linux/swap.h> > > #include <trace/events/cma.h> > > > > #include "cma.h" > > @@ -397,6 +398,14 @@ static void cma_debug_show_areas(struct cma *cma) > > static inline void cma_debug_show_areas(struct cma *cma) { } > > #endif > > > > +static void reset_page_kasan_tag(struct page *page, int count) > > +{ > > + int i; > > + > > + for (i = 0; i < count; i++) > > + page_kasan_tag_reset(page + i); > > +} > > + > > /** > > * cma_alloc() - allocate pages from contiguous area > > * @cma: Contiguous memory region for which the allocation is performed. > > @@ -414,7 +423,6 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > > unsigned long pfn = -1; > > unsigned long start = 0; > > unsigned long bitmap_maxno, bitmap_no, bitmap_count; > > - size_t i; > > struct page *page = NULL; > > int ret = -ENOMEM; > > > > @@ -479,10 +487,8 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > > * blocks being marked with different tags. Reset the tags to ignore > > * those page blocks. > > */ > > - if (page) { > > - for (i = 0; i < count; i++) > > - page_kasan_tag_reset(page + i); > > - } > > + if (page) > > + reset_page_kasan_tag(page, count); > > > > if (ret && !no_warn) { > > pr_err("%s: alloc failed, req-size: %zu pages, ret: %d\n", > > @@ -494,6 +500,116 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > > return page; > > } > > > > +/* > > + * cma_alloc_bulk() - allocate high order bulk pages from contiguous area with > > + * best effort. It will usually be used for private @cma > > + * > > + * @cma: contiguous memory region for which the allocation is performed. > > + * @align: requested alignment of pages (in PAGE_SIZE order). > > + * @fast: will skip costly opeartions if it's true. > > + * @order: requested page order > > + * @nr_requests: the number of 2^order pages requested to be allocated as input, > > + * @page_array: page_array pointer to store allocated pages (must have space > > + * for at least nr_requests) > > + * @nr_allocated: the number of 2^order pages allocated as output > > + * > > + * This function tries to allocate up to @nr_requests @order pages on specific > > + * contiguous memory area. If @fast has true, it will avoid costly functions > > + * to increase allocation success ratio so it will be faster but might return > > + * less than requested number of pages. User could retry it with true if it is > > + * needed. > > + * > > + * Return: it will return 0 only if all pages requested by @nr_requestsed are > > + * allocated. Otherwise, it returns negative error code. > > + * > > + * Note: Regardless of success/failure, user should check @nr_allocated to see > > + * how many @order pages are allocated and free those pages when they are not > > + * needed. > > + */ > > +int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast, > > + unsigned int order, size_t nr_requests, > > + struct page **page_array, size_t *nr_allocated) > > +{ > > + int ret = 0; > > + size_t i = 0; > > + unsigned long nr_pages_needed = nr_requests * (1 << order); > > + unsigned long nr_chunk_pages, nr_pages; > > + unsigned long mask, offset; > > + unsigned long pfn = -1; > > + unsigned long start = 0; > > + unsigned long bitmap_maxno, bitmap_no, bitmap_count; > > + struct page *page = NULL; > > + enum alloc_contig_mode mode = fast ? ALLOC_CONTIG_FAST : > > + ALLOC_CONTIG_NORMAL; > > + *nr_allocated = 0; > > + if (!cma || !cma->count || !cma->bitmap || !page_array) > > + return -EINVAL; > > + > > + if (!nr_pages_needed) > > + return 0; > > + > > + nr_chunk_pages = 1 << max_t(unsigned int, order, pageblock_order); > > + > > + mask = cma_bitmap_aligned_mask(cma, align); > > + offset = cma_bitmap_aligned_offset(cma, align); > > + bitmap_maxno = cma_bitmap_maxno(cma); > > + > > + lru_add_drain_all(); > > + drain_all_pages(NULL); > > + > > + while (nr_pages_needed) { > > + nr_pages = min(nr_chunk_pages, nr_pages_needed); > > + > > + bitmap_count = cma_bitmap_pages_to_bits(cma, nr_pages); > > + mutex_lock(&cma->lock); > > + bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap, > > + bitmap_maxno, start, bitmap_count, mask, > > + offset); > > + if (bitmap_no >= bitmap_maxno) { > > + mutex_unlock(&cma->lock); > > + break; > > + } > > + bitmap_set(cma->bitmap, bitmap_no, bitmap_count); > > + /* > > + * It's safe to drop the lock here. If the migration fails > > + * cma_clear_bitmap will take the lock again and unmark it. > > + */ > > + mutex_unlock(&cma->lock); > > + > > + pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); > > + ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, > > + GFP_KERNEL|__GFP_NOWARN, mode); > > + if (ret) { > > + cma_clear_bitmap(cma, pfn, nr_pages); > > + if (ret != -EBUSY) > > + break; > > + > > + /* continue to search next block */ > > + start = (pfn + nr_pages - cma->base_pfn) >> > > + cma->order_per_bit; > > + continue; > > + } > > + > > + page = pfn_to_page(pfn); > > + while (nr_pages) { > > + page_array[i++] = page; > > + reset_page_kasan_tag(page, 1 << order); > > + page += 1 << order; > > + nr_pages -= 1 << order; > > + nr_pages_needed -= 1 << order; > > + } > > + > > + start = bitmap_no + bitmap_count; > > + } > > + > > + *nr_allocated = i; > > + > > + if (!ret && nr_pages_needed) > > + ret = -EBUSY; > > + > > + return ret; > > +} > > + > > /** > > * cma_release() - release allocated pages > > * @cma: Contiguous memory region for which the allocation is performed. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index adfbfd95fbc3..2a1799ff14fc 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -8463,7 +8463,8 @@ static unsigned long pfn_max_align_up(unsigned long pfn) > > > > /* [start, end) must belong to a single zone. */ > > static int __alloc_contig_migrate_range(struct compact_control *cc, > > - unsigned long start, unsigned long end) > > + unsigned long start, unsigned long end, > > + unsigned int max_tries) > > { > > /* This function is based on compact_zone() from compaction.c. */ > > unsigned int nr_reclaimed; > > @@ -8491,7 +8492,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, > > break; > > } > > tries = 0; > > - } else if (++tries == 5) { > > + } else if (++tries == max_tries) { > > ret = ret < 0 ? ret : -EBUSY; > > break; > > } > > @@ -8553,6 +8554,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > unsigned long outer_start, outer_end; > > unsigned int order; > > int ret = 0; > > + bool fast_mode = mode == ALLOC_CONTIG_FAST; > > > > struct compact_control cc = { > > .nr_migratepages = 0, > > @@ -8595,7 +8597,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > if (ret) > > return ret; > > > > - drain_all_pages(cc.zone); > > + if (!fast_mode) > > + drain_all_pages(cc.zone); > > > > /* > > * In case of -EBUSY, we'd like to know which page causes problem. > > @@ -8607,7 +8610,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > * allocated. So, if we fall through be sure to clear ret so that > > * -EBUSY is not accidentally used or returned to caller. > > */ > > - ret = __alloc_contig_migrate_range(&cc, start, end); > > + ret = __alloc_contig_migrate_range(&cc, start, end, fast_mode ? 1 : 5); > > if (ret && ret != -EBUSY) > > goto done; > > ret =0; > > @@ -8629,7 +8632,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > * isolated thus they won't get removed from buddy. > > */ > > > > - lru_add_drain_all(); > > + if (!fast_mode) > > + lru_add_drain_all(); > > > > order = 0; > > outer_start = start; > > @@ -8656,8 +8660,9 @@ int alloc_contig_range(unsigned long start, unsigned long end, > > > > /* Make sure the range is really isolated. */ > > if (test_pages_isolated(outer_start, end, 0)) { > > - pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", > > - __func__, outer_start, end); > > + if (!fast_mode) > > + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", > > + __func__, outer_start, end); > > ret = -EBUSY; > > goto done; > > } > > > > > -- > Thanks, > > David / dhildenb >