On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote: > From: Zi Yan <ziy@xxxxxxxxxx> > > alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging > pageblocks with different migratetypes. It might unnecessarily convert > extra pageblocks at the beginning and at the end of the range. Change > alloc_contig_range() to work at pageblock granularity. > > It is done by restoring pageblock types and split >pageblock_order free > pages after isolating at MAX_ORDER-1 granularity and migrating pages > away at pageblock granularity. The reason for this process is that > during isolation, some pages, either free or in-use, might have >pageblock > sizes and isolating part of them can cause free accounting issues. > Restoring the migratetypes of the pageblocks not in the interesting > range later is much easier. Hi Zi Yan, Due to time constraints I only glanced over, so some comments below about stuff that caught my eye: > +static inline void split_free_page_into_pageblocks(struct page *free_page, > + int order, struct zone *zone) > +{ > + unsigned long pfn; > + > + spin_lock(&zone->lock); > + del_page_from_free_list(free_page, zone, order); > + for (pfn = page_to_pfn(free_page); > + pfn < page_to_pfn(free_page) + (1UL << order); It migt make sense to have a end_pfn variable so that does not have to be constantly evaluated. Or maybe the compiler is clever enough to only evualuate it once. > + pfn += pageblock_nr_pages) { > + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > + > + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order, > + mt, FPI_NONE); > + } > + spin_unlock(&zone->lock); It is possible that free_page's order is already pageblock_order, so I would add a one-liner upfront to catch that case and return, otherwise we do the delete_from_freelist-and-free_it_again dance. > + /* Save the migratepages of the pageblocks before start and after end */ > + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages > + + (isolate_end - alloc_end) / pageblock_nr_pages; > + saved_mt = > + kmalloc_array(num_pageblock_to_save, > + sizeof(unsigned char), GFP_KERNEL); > + if (!saved_mt) > + return -ENOMEM; > + > + num = save_migratetypes(saved_mt, isolate_start, alloc_start); > + > + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end); I really hope we can put all this magic within start_isolate_page_range, and the counterparts in undo_isolate_page_range. Also, I kinda dislike the &saved_mt thing. I thought about some other approaches but nothing that wasn't too specific for this case, and I guess we want that function to be as generic as possible. > + /* > + * Split free page spanning [alloc_end, isolate_end) and put the > + * pageblocks in the right migratetype list > + */ > + for (outer_end = alloc_end; outer_end < isolate_end;) { > + unsigned long begin_pfn = outer_end; > + > + order = 0; > + while (!PageBuddy(pfn_to_page(outer_end))) { > + if (++order >= MAX_ORDER) { > + outer_end = begin_pfn; > + break; > + } > + outer_end &= ~0UL << order; > + } > + > + if (outer_end != begin_pfn) { > + order = buddy_order(pfn_to_page(outer_end)); > + > + /* > + * split the free page has start page and put the pageblocks > + * in the right migratetype list > + */ > + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn); How could this possibily happen? > + { > + struct page *free_page = pfn_to_page(outer_end); > + > + split_free_page_into_pageblocks(free_page, order, cc.zone); > + } > + outer_end += 1UL << order; > + } else > + outer_end = begin_pfn + 1; > } I think there are cases could optimize for. If the page has already been split in pageblock by the outer_start loop, we could skip this outer_end logic altogether. E.g: An order-10 page is split in two pageblocks. There's nothing else to be done, right? We could skip this. -- Oscar Salvador SUSE Labs