On 30 Mar 2022, at 18:15, Vlastimil Babka wrote: > On 3/31/22 00:03, Zi Yan wrote: >> On 30 Mar 2022, at 17:43, Vlastimil Babka wrote: >> >>> On 3/30/22 22:05, Linus Torvalds wrote: >>>> On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: >>>>> >>>>> I started testing new patches and it crashed when doing the x86-32 test on >>>>> boot up. >>>>> >>>>> Initializing HighMem for node 0 (000375fe:0021ee00) >>>>> BUG: kernel NULL pointer dereference, address: 00000878 >>>>> #PF: supervisor read access in kernel mode >>>>> #PF: error_code(0x0000) - not-present page >>>>> *pdpt = 0000000000000000 *pde = f0000000f000eef3 >>>>> Oops: 0000 [#1] PREEMPT SMP PTI >>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469 >>>>> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 >>>>> EIP: get_pfnblock_flags_mask+0x2c/0x36 >>>>> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1 >>>> >>>> The whole function is in that Code: thing, and it decodes to: >>>> >>>> 0: 55 push %ebp >>>> 1: 89 e5 mov %esp,%ebp >>>> 3: 56 push %esi >>>> 4: 89 ce mov %ecx,%esi >>>> 6: 53 push %ebx >>>> 7: 8b 18 mov (%eax),%ebx >>>> 9: 89 d8 mov %ebx,%eax >>>> b: c1 eb 1e shr $0x1e,%ebx >>>> e: e8 f7 fb ff ff call 0xfffffc0a >>>> 13: 69 db c0 02 00 00 imul $0x2c0,%ebx,%ebx >>>> 19: 89 c1 mov %eax,%ecx >>>> 1b: 89 c2 mov %eax,%edx >>>> 1d: c1 ea 05 shr $0x5,%edx >>>> 20: 8b 83 7c d7 79 c1 mov -0x3e862884(%ebx),%eax >>>> 26: 5b pop %ebx >>>> 27:* 8b 04 90 mov (%eax,%edx,4),%eax <-- trapping instruction >>>> 2a: d3 e8 shr %cl,%eax >>>> 2c: 21 f0 and %esi,%eax >>>> 2e: 5e pop %esi >>>> 2f: 5d pop %ebp >>>> 30: c3 ret >>>> >>>> with '%eax' being NULL, and %edx being 0x21e. >>>> >>>> (The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does >>>> similar code generation, so it's out-of-lined part of pfn_to_bitidx() >>>> despite being marked inline) >>>> >>>> So that oops is that >>>> >>>> word = bitmap[word_bitidx]; >>>> >>>> line, with 'bitmap' being NULL (and %edx contains 'word_bitidx'). >>>> >>>> Looking around, your 'config-bad' doesn't even have >>>> CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this >>>> part of the change: >>>> >>>> - if (unlikely(has_isolate_pageblock(zone))) { >>>> >>>> which used to always be false for that config, and now the code is >>>> suddenly enabled. >>> >>> If CONFIG_MEMORY_ISOLATION was enabled then the zone layout would be the >>> same, so I think it's not simply that. I think it's the timing - >>> has_isolate_pageblock(zone) would only be possible to become true later >>> in runtime when some isolation is ongoing, but here it seems we are >>> still in the early boot. Probably at a boundary of highmem with another >>> zone that doesn't have the pageblock bitmap yet initialized? While later >>> it would have, and all would be fine. >> >> It seems so based on the boot log from Steven. >> >>> >>> As Zi Yan said, the usual merging code will, through page_is_buddy() >>> find safely enough the buddy is not applicable, so I agree with his >>> patch direction. Seems this also shows the code tried to become too >>> smart and for the next merge window we should try just move the >>> migratetype checks into the main while loop (under something like "if >>> (order >= max_order)") and simplify the function a lot, hopefully with >>> negligible perf impact. >> >> Something like this? > > Exactly (with some nits). Thanks for the comment. Should I send the patch below with nits fix to supersede the patch I just sent? > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 6c6af8658775..568ecaf5700d 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1053,7 +1053,6 @@ static inline void __free_one_page(struct page *page, >> int migratetype, fpi_t fpi_flags) >> { >> struct capture_control *capc = task_capc(zone); >> - unsigned int max_order = pageblock_order; >> unsigned long buddy_pfn; >> unsigned long combined_pfn; >> struct page *buddy; >> @@ -1069,8 +1068,7 @@ static inline void __free_one_page(struct page *page, >> VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page); >> VM_BUG_ON_PAGE(bad_range(zone, page), page); >> >> -continue_merging: >> - while (order < max_order) { >> + while (order < MAX_ORDER - 1) { >> if (compaction_capture(capc, page, order, migratetype)) { >> __mod_zone_freepage_state(zone, -(1 << order), >> migratetype); >> @@ -1081,6 +1079,27 @@ static inline void __free_one_page(struct page *page, >> >> if (!page_is_buddy(page, buddy, order)) >> goto done_merging; >> + >> + if (order >= pageblock_order) { > > Would use unlikely() here. > >> + /* If we are here, it means order is >= pageblock_order. > > Can drop this like as it became quite obvious now :) > >> + * We want to prevent merge between freepages on pageblock >> + * without fallbacks and normal pageblock. Without this, >> + * pageblock isolation could cause incorrect freepage or CMA >> + * accounting or HIGHATOMIC accounting. >> + * >> + * We don't want to hit this code for the more frequent >> + * low-order merging. > > The last two lines should be also dropped now. > >> + */ >> + int buddy_mt; >> + >> + buddy_mt = get_pageblock_migratetype(buddy); > > Can be now single line. > >> + >> + if (migratetype != buddy_mt >> + && (!migratetype_is_mergeable(migratetype) || >> + !migratetype_is_mergeable(buddy_mt))) >> + goto done_merging; >> + } >> + >> /* >> * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page, >> * merge with it and move up one order. >> @@ -1094,32 +1113,6 @@ static inline void __free_one_page(struct page *page, >> pfn = combined_pfn; >> order++; >> } >> - if (order < MAX_ORDER - 1) { >> - /* If we are here, it means order is >= pageblock_order. >> - * We want to prevent merge between freepages on pageblock >> - * without fallbacks and normal pageblock. Without this, >> - * pageblock isolation could cause incorrect freepage or CMA >> - * accounting or HIGHATOMIC accounting. >> - * >> - * We don't want to hit this code for the more frequent >> - * low-order merging. >> - */ >> - int buddy_mt; >> - >> - buddy_pfn = __find_buddy_pfn(pfn, order); >> - buddy = page + (buddy_pfn - pfn); >> - >> - if (!page_is_buddy(page, buddy, order)) >> - goto done_merging; >> - buddy_mt = get_pageblock_migratetype(buddy); >> - >> - if (migratetype != buddy_mt >> - && (!migratetype_is_mergeable(migratetype) || >> - !migratetype_is_mergeable(buddy_mt))) >> - goto done_merging; >> - max_order = order + 1; >> - goto continue_merging; >> - } >> >> done_merging: >> set_buddy_order(page, order); >> >> -- >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature