On 20 Sep 2023, at 12:04, Johannes Weiner wrote: > On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote: >> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote: >>> On 9/20/23 03:38, Zi Yan wrote: >>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote: >>>> >>>>> On 09/19/23 16:57, Zi Yan wrote: >>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: >>>>>> >>>>>>> --- a/mm/page_alloc.c >>>>>>> +++ b/mm/page_alloc.c >>>>>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, >>>>>>> end = pageblock_end_pfn(pfn) - 1; >>>>>>> >>>>>>> /* Do not cross zone boundaries */ >>>>>>> +#if 0 >>>>>>> if (!zone_spans_pfn(zone, start)) >>>>>>> start = zone->zone_start_pfn; >>>>>>> +#else >>>>>>> + if (!zone_spans_pfn(zone, start)) >>>>>>> + start = pfn; >>>>>>> +#endif >>>>>>> if (!zone_spans_pfn(zone, end)) >>>>>>> return false; >>>>>>> I can still trigger warnings. >>>>>> >>>>>> OK. One thing to note is that the page type in the warning changed from >>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. >>>>>> >>>>> >>>>> Just to be really clear, >>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. >>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call >>>>> path WITHOUT your change. >>>>> >>>>> I am guessing the difference here has more to do with the allocation path? >>>>> >>>>> I went back and reran focusing on the specific migrate type. >>>>> Without your patch, and coming from the alloc_contig_range call path, >>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above. >>>>> With your patch I got one 'page type is 0, passed migratetype is 1' >>>>> warning and one 'page type is 1, passed migratetype is 0' warning. >>>>> >>>>> I could be wrong, but I do not think your patch changes things. >>>> >>>> Got it. Thanks for the clarification. >>>>> >>>>>>> >>>>>>> One idea about recreating the issue is that it may have to do with size >>>>>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried >>>>>>> to really stress the allocations by increasing the number of hugetlb >>>>>>> pages requested and that did not help. I also noticed that I only seem >>>>>>> to get two warnings and then they stop, even if I continue to run the >>>>>>> script. >>>>>>> >>>>>>> Zi asked about my config, so it is attached. >>>>>> >>>>>> With your config, I still have no luck reproducing the issue. I will keep >>>>>> trying. Thanks. >>>>>> >>>>> >>>>> Perhaps try running both scripts in parallel? >>>> >>>> Yes. It seems to do the trick. >>>> >>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory? >>>> >>>> I am able to reproduce it with the script below: >>>> >>>> while true; do >>>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& >>>> echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& >>>> wait >>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >>>> done >>>> >>>> I will look into the issue. >> >> Nice! >> >> I managed to reproduce it ONCE, triggering it not even a second after >> starting the script. But I can't seem to do it twice, even after >> several reboots and letting it run for minutes. > > I managed to reproduce it reliably by cutting the nr_hugepages > parameters respectively in half. > > The one that triggers for me is always MIGRATE_ISOLATE. With some > printk-tracing, the scenario seems to be this: > > #0 #1 > start_isolate_page_range() > isolate_single_pageblock() > set_migratetype_isolate(tail) > lock zone->lock > move_freepages_block(tail) // nop > set_pageblock_migratetype(tail) > unlock zone->lock > del_page_from_freelist(head) > expand(head, head_mt) > WARN(head_mt != tail_mt) > start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES) > for (pfn = start_pfn, pfn < end_pfn) > if (PageBuddy()) > split_free_page(head) > > IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the > lock. The move_freepages_block() does nothing because the PageBuddy() > is set on the pageblock to the left. Once we drop the lock, the buddy > gets allocated and the expand() puts things on the wrong list. The > splitting code that handles MAX_ORDER blocks runs *after* the tail > type is set and the lock has been dropped, so it's too late. Yes, this is the issue I can confirm as well. But it is intentional to enable allocating a contiguous range at pageblock granularity instead of MAX_ORDER granularity. With your changes below, it no longer works, because if there is an unmovable page in [ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)), the allocation fails but it would succeed in current implementation. I think a proper fix would be to make move_freepages_block() split the MAX_ORDER page and put the split pages in the right migratetype free lists. I am working on that. > > I think this would work fine if we always set MIGRATE_ISOLATE in a > linear fashion, with start and end aligned to MAX_ORDER. Then we also > wouldn't have to split things. > > There are two reasons this doesn't happen today: > > 1. The isolation range is rounded to pageblocks, not MAX_ORDER. In > this test case they always seem aligned, but it's not > guaranteed. However, > > 2. start_isolate_page_range() explicitly breaks ordering by doing the > last block in the range before the center. It's that last block > that triggers the race with __rmqueue_smallest -> expand() for me. > > With the below patch I can no longer reproduce the issue: > > --- > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index b5c7a9d21257..b7c8730bf0e2 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -538,8 +538,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > unsigned long pfn; > struct page *page; > /* isolation is done at page block granularity */ > - unsigned long isolate_start = pageblock_start_pfn(start_pfn); > - unsigned long isolate_end = pageblock_align(end_pfn); > + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES); > + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES); > int ret; > bool skip_isolation = false; > > @@ -549,17 +549,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > if (ret) > return ret; > > - if (isolate_start == isolate_end - pageblock_nr_pages) > - skip_isolation = true; > - > - /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ > - ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, > - skip_isolation, migratetype); > - if (ret) { > - unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype); > - return ret; > - } > - > /* skip isolated pageblocks at the beginning and end */ > for (pfn = isolate_start + pageblock_nr_pages; > pfn < isolate_end - pageblock_nr_pages; > @@ -568,12 +557,21 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > if (page && set_migratetype_isolate(page, migratetype, flags, > start_pfn, end_pfn)) { > undo_isolate_page_range(isolate_start, pfn, migratetype); > - unset_migratetype_isolate( > - pfn_to_page(isolate_end - pageblock_nr_pages), > - migratetype); > return -EBUSY; > } > } > + > + if (isolate_start == isolate_end - pageblock_nr_pages) > + skip_isolation = true; > + > + /* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */ > + ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, > + skip_isolation, migratetype); > + if (ret) { > + undo_isolate_page_range(isolate_start, pfn, migratetype); > + return ret; > + } > + > return 0; > } > > @@ -591,8 +589,8 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > { > unsigned long pfn; > struct page *page; > - unsigned long isolate_start = pageblock_start_pfn(start_pfn); > - unsigned long isolate_end = pageblock_align(end_pfn); > + unsigned long isolate_start = ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES); > + unsigned long isolate_end = ALIGN(end_pfn, MAX_ORDER_NR_PAGES); > > for (pfn = isolate_start; > pfn < isolate_end; -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature