On Sun, Aug 16, 2020 at 02:31:22PM +0200, David Hildenbrand wrote: > On 14.08.20 19:31, Minchan Kim wrote: > > There is a need for special HW to require bulk allocation of > > high-order pages. For example, 4800 * order-4 pages. > > > > To meet the requirement, a option is using CMA area because > > page allocator with compaction under memory pressure is > > easily failed to meet the requirement and too slow for 4800 > > times. However, CMA has also the following drawbacks: > > > > * 4800 of order-4 * cma_alloc is too slow > > > > To avoid the slowness, we could try to allocate 300M contiguous > > memory once and then split them into order-4 chunks. > > The problem of this approach is CMA allocation fails one of the > > pages in those range couldn't migrate out, which happens easily > > with fs write under memory pressure. > > Why not chose a value in between? Like try to allocate MAX_ORDER - 1 > chunks and split them. That would already heavily reduce the call frequency. I think you meant this: alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER - 1) It would work if system has lots of non-fragmented free memory. However, once they are fragmented, it doesn't work. That's why we have seen even order-4 allocation failure in the field easily and that's why CMA was there. CMA has more logics to isolate the memory during allocation/freeing as well as fragmentation avoidance so that it has less chance to be stealed from others and increase high success ratio. That's why I want this API to be used with CMA or movable zone. A usecase is device can set a exclusive CMA area up when system boots. When device needs 4800 * order-4 pages, it could call this bulk against of the area so that it could effectively be guaranteed to allocate enough fast. > > I don't see a real need for a completely new range allocator function > for this special case yet. > > > > > To solve issues, this patch introduces alloc_pages_bulk. > > > > int alloc_pages_bulk(unsigned long start, unsigned long end, > > unsigned int migratetype, gfp_t gfp_mask, > > unsigned int order, unsigned int nr_elem, > > struct page **pages); > > > > It will investigate the [start, end) and migrate movable pages > > out there by best effort(by upcoming patches) to make requested > > order's free pages. > > > > The allocated pages will be returned using pages parameter. > > Return value represents how many of requested order pages we got. > > It could be less than user requested by nr_elem. > > > > /** > > * alloc_pages_bulk() -- tries to allocate high order pages > > * by batch from given range [start, end) > > * @start: start PFN to allocate > > * @end: one-past-the-last PFN to allocate > > * @migratetype: migratetype of the underlaying pageblocks (either > > * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks > > * in range must have the same migratetype and it must > > * be either of the two. > > * @gfp_mask: GFP mask to use during compaction > > * @order: page order requested > > * @nr_elem: the number of high-order pages to allocate > > * @pages: page array pointer to store allocated pages (must > > * have space for at least nr_elem elements) > > * > > * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES > > * aligned. The PFN range must belong to a single zone. > > * > > * Return: the number of pages allocated on success or negative error code. > > * The allocated pages should be freed using __free_pages > > */ > > > > The test goes order-4 * 4800 allocation(i.e., total 300MB) under kernel > > build workload. System RAM size is 1.5GB and CMA is 500M. > > > > With using CMA to allocate to 300M, ran 10 times trial, 10 time failed > > with big latency(up to several seconds). > > > > With this alloc_pages_bulk API, ran 10 time trial, 7 times are > > successful to allocate 4800 times. Rest 3 times are allocated 4799, 4789 > > and 4799. They are all done with 300ms. > > > > This patchset is against on next-20200813 > > > > Minchan Kim (7): > > mm: page_owner: split page by order > > mm: introduce split_page_by_order > > mm: compaction: deal with upcoming high-order page splitting > > mm: factor __alloc_contig_range out > > mm: introduce alloc_pages_bulk API > > mm: make alloc_pages_bulk best effort > > mm/page_isolation: avoid drain_all_pages for alloc_pages_bulk > > > > include/linux/gfp.h | 5 + > > include/linux/mm.h | 2 + > > include/linux/page-isolation.h | 1 + > > include/linux/page_owner.h | 10 +- > > mm/compaction.c | 64 +++++++---- > > mm/huge_memory.c | 2 +- > > mm/internal.h | 5 +- > > mm/page_alloc.c | 198 ++++++++++++++++++++++++++------- > > mm/page_isolation.c | 10 +- > > mm/page_owner.c | 7 +- > > 10 files changed, 230 insertions(+), 74 deletions(-) > > > > > -- > Thanks, > > David / dhildenb >