On 20 Sep 2023, at 13:23, Zi Yan wrote: > On 20 Sep 2023, at 12:04, Johannes Weiner wrote: > >> On Wed, Sep 20, 2023 at 09:48:12AM -0400, Johannes Weiner wrote: >>> On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote: >>>> On 9/20/23 03:38, Zi Yan wrote: >>>>> On 19 Sep 2023, at 20:32, Mike Kravetz wrote: >>>>> >>>>>> On 09/19/23 16:57, Zi Yan wrote: >>>>>>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: >>>>>>> >>>>>>>> --- a/mm/page_alloc.c >>>>>>>> +++ b/mm/page_alloc.c >>>>>>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, >>>>>>>> end = pageblock_end_pfn(pfn) - 1; >>>>>>>> >>>>>>>> /* Do not cross zone boundaries */ >>>>>>>> +#if 0 >>>>>>>> if (!zone_spans_pfn(zone, start)) >>>>>>>> start = zone->zone_start_pfn; >>>>>>>> +#else >>>>>>>> + if (!zone_spans_pfn(zone, start)) >>>>>>>> + start = pfn; >>>>>>>> +#endif >>>>>>>> if (!zone_spans_pfn(zone, end)) >>>>>>>> return false; >>>>>>>> I can still trigger warnings. >>>>>>> >>>>>>> OK. One thing to note is that the page type in the warning changed from >>>>>>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. >>>>>>> >>>>>> >>>>>> Just to be really clear, >>>>>> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. >>>>>> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call >>>>>> path WITHOUT your change. >>>>>> >>>>>> I am guessing the difference here has more to do with the allocation path? >>>>>> >>>>>> I went back and reran focusing on the specific migrate type. >>>>>> Without your patch, and coming from the alloc_contig_range call path, >>>>>> I got two warnings of 'page type is 0, passed migratetype is 1' as above. >>>>>> With your patch I got one 'page type is 0, passed migratetype is 1' >>>>>> warning and one 'page type is 1, passed migratetype is 0' warning. >>>>>> >>>>>> I could be wrong, but I do not think your patch changes things. >>>>> >>>>> Got it. Thanks for the clarification. >>>>>> >>>>>>>> >>>>>>>> One idea about recreating the issue is that it may have to do with size >>>>>>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried >>>>>>>> to really stress the allocations by increasing the number of hugetlb >>>>>>>> pages requested and that did not help. I also noticed that I only seem >>>>>>>> to get two warnings and then they stop, even if I continue to run the >>>>>>>> script. >>>>>>>> >>>>>>>> Zi asked about my config, so it is attached. >>>>>>> >>>>>>> With your config, I still have no luck reproducing the issue. I will keep >>>>>>> trying. Thanks. >>>>>>> >>>>>> >>>>>> Perhaps try running both scripts in parallel? >>>>> >>>>> Yes. It seems to do the trick. >>>>> >>>>>> Adjust the number of hugetlb pages allocated to equal 25% of memory? >>>>> >>>>> I am able to reproduce it with the script below: >>>>> >>>>> while true; do >>>>> echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& >>>>> echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& >>>>> wait >>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >>>>> echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >>>>> done >>>>> >>>>> I will look into the issue. >>> >>> Nice! >>> >>> I managed to reproduce it ONCE, triggering it not even a second after >>> starting the script. But I can't seem to do it twice, even after >>> several reboots and letting it run for minutes. >> >> I managed to reproduce it reliably by cutting the nr_hugepages >> parameters respectively in half. >> >> The one that triggers for me is always MIGRATE_ISOLATE. With some >> printk-tracing, the scenario seems to be this: >> >> #0 #1 >> start_isolate_page_range() >> isolate_single_pageblock() >> set_migratetype_isolate(tail) >> lock zone->lock >> move_freepages_block(tail) // nop >> set_pageblock_migratetype(tail) >> unlock zone->lock >> del_page_from_freelist(head) >> expand(head, head_mt) >> WARN(head_mt != tail_mt) >> start_pfn = ALIGN_DOWN(MAX_ORDER_NR_PAGES) >> for (pfn = start_pfn, pfn < end_pfn) >> if (PageBuddy()) >> split_free_page(head) >> >> IOW, we update a pageblock that isn't MAX_ORDER aligned, then drop the >> lock. The move_freepages_block() does nothing because the PageBuddy() >> is set on the pageblock to the left. Once we drop the lock, the buddy >> gets allocated and the expand() puts things on the wrong list. The >> splitting code that handles MAX_ORDER blocks runs *after* the tail >> type is set and the lock has been dropped, so it's too late. > > Yes, this is the issue I can confirm as well. But it is intentional to enable > allocating a contiguous range at pageblock granularity instead of MAX_ORDER > granularity. With your changes below, it no longer works, because if there > is an unmovable page in > [ALIGN_DOWN(start_pfn, MAX_ORDER_NR_PAGES), pageblock_start_pfn(start_pfn)), > the allocation fails but it would succeed in current implementation. > > I think a proper fix would be to make move_freepages_block() split the > MAX_ORDER page and put the split pages in the right migratetype free lists. > > I am working on that. After spending half a day on this, I think it is much harder than I thought to get alloc_contig_range() working with the freelist migratetype hygiene patchset. Because alloc_contig_range() relies on racy migratetype changes: 1. pageblocks in the range are first marked as MIGRATE_ISOLATE to prevent another parallel isolation, but they are not moved to the MIGRATE_ISOLATE free list yet. 2. later in the process, isolate_freepages_range() is used to actually grab the free pages. 3. there was no problem when alloc_contig_range() works on MAX_ORDER aligned ranges, since MIGRATE_ISOLATE cannot be set in the middle of free pages or in-use pages. But it is not the case when alloc_contig_range() work on pageblock aligned ranges. Now during isolation phase, free or in-use pages will need to be split to get their subpages into the right free lists. 4. the hardest case is when a in-use page sits across two pageblocks, currently, the code just isolate one pageblock, migrate the page, and let split_free_page() to correct the free list later. But to strictly enforce freelist migratetype hygiene, extra work is needed at free page path to split the free page into the right freelists. I need more time to think about how to get alloc_contig_range() properly. Help is needed for the bullet point 4. Thanks. PS: One observation is that after move_to_free_list(), a page's migratetype does not match the migratetype of its free list. I might need to make changes on top of your patchset to get alloc_contig_range() working. -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature