On 9/20/23 03:38, Zi Yan wrote: > On 19 Sep 2023, at 20:32, Mike Kravetz wrote: > >> On 09/19/23 16:57, Zi Yan wrote: >>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: >>> >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, >>>> end = pageblock_end_pfn(pfn) - 1; >>>> >>>> /* Do not cross zone boundaries */ >>>> +#if 0 >>>> if (!zone_spans_pfn(zone, start)) >>>> start = zone->zone_start_pfn; >>>> +#else >>>> + if (!zone_spans_pfn(zone, start)) >>>> + start = pfn; >>>> +#endif >>>> if (!zone_spans_pfn(zone, end)) >>>> return false; >>>> I can still trigger warnings. >>> >>> OK. One thing to note is that the page type in the warning changed from >>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. >>> >> >> Just to be really clear, >> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. >> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call >> path WITHOUT your change. >> >> I am guessing the difference here has more to do with the allocation path? >> >> I went back and reran focusing on the specific migrate type. >> Without your patch, and coming from the alloc_contig_range call path, >> I got two warnings of 'page type is 0, passed migratetype is 1' as above. >> With your patch I got one 'page type is 0, passed migratetype is 1' >> warning and one 'page type is 1, passed migratetype is 0' warning. >> >> I could be wrong, but I do not think your patch changes things. > > Got it. Thanks for the clarification. >> >>>> >>>> One idea about recreating the issue is that it may have to do with size >>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried >>>> to really stress the allocations by increasing the number of hugetlb >>>> pages requested and that did not help. I also noticed that I only seem >>>> to get two warnings and then they stop, even if I continue to run the >>>> script. >>>> >>>> Zi asked about my config, so it is attached. >>> >>> With your config, I still have no luck reproducing the issue. I will keep >>> trying. Thanks. >>> >> >> Perhaps try running both scripts in parallel? > > Yes. It seems to do the trick. > >> Adjust the number of hugetlb pages allocated to equal 25% of memory? > > I am able to reproduce it with the script below: > > while true; do > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& > echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& > wait > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > done > > I will look into the issue. With migratetypes 0 and 1 and somewhat harder to reproduce scenario (= less deterministic, more racy) it's possible we now see what I suspected can happen here: https://lore.kernel.org/all/37dbd4d0-c125-6694-dec4-6322ae5b6dee@xxxxxxx/ In that there are places reading the migratetype outside of zone lock.