on 8/28/2023 11:21 PM, Naoya Horiguchi wrote: > On Sat, Aug 26, 2023 at 11:47:43PM +0800, Kemeng Shi wrote: >> When guard page debug is enabled and set_page_guard returns success, we >> miss to forward page to point to start of next split range and we will do >> split unexpectedly in page range without target page. Move start page >> update before set_page_guard to fix this. >> >> As we split to wrong target page, then splited pages are not able to merge >> back to original order when target page is put back and splited pages >> except target page is not usable. To be specific: >> >> Consider target page is the third page in buddy page with order 2. >> | buddy-2 | Page | Target | Page | >> >> After break down to target page, we will only set first page to Guard >> because of bug. >> | Guard | Page | Target | Page | >> >> When we try put_page_back_buddy with target page, the buddy page of target >> if neither guard nor buddy, Then it's not able to construct original page >> with order 2 >> | Guard | Page | buddy-0 | Page | >> >> All pages except target page is not in free list and is not usable. >> >> Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages") >> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> > > Thank you for finding the problem and writing patches. I think the patch > fixes the reported problem, But I wonder that we really need guard page > mechanism in break_down_buddy_pages() which is only called from memory_failure. > As stated in Documentation/admin-guide/kernel-parameters.txt, this is a > debugging feature to detect memory corruption due to buggy kernel or drivers > code. So if HW memory failrue seems to be out of the scope, and I feel that > we could simply remove it from break_down_buddy_pages(). > > debug_guardpage_minorder= > [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this > parameter allows control of the order of pages that will > be intentionally kept free (and hence protected) by the > buddy allocator. Bigger value increase the probability > of catching random memory corruption, but reduce the > amount of memory for normal system use. The maximum > possible value is MAX_ORDER/2. Setting this parameter > to 1 or 2 should be enough to identify most random > memory corruption problems caused by bugs in kernel or > driver code when a CPU writes to (or reads from) a > random memory location. Note that there exists a class > of memory corruptions problems caused by buggy H/W or > F/W or by drivers badly programming DMA (basically when > memory is written at bus level and the CPU MMU is > bypassed) which are not detectable by > CONFIG_DEBUG_PAGEALLOC, hence this option will not help > tracking down these problems. > > If you have any idea about how guard page mechanism helps memory_failrue, > could you share it? > Hi Naoya, thanks for feedback. Commit c0a32fc5a2e47 ("mm: more intensive memory corruption debugging") menthioned we konw that with CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception on access (read,write) to an unallocated page, which permits us to catch code which corrupts memory; Guard page aims to keep more free/protected pages and to interlace free/protected and allocated pages to increase the probability of catching corruption. Keep guard page around failrue looks helpful to catch random access. Wish this can help. > Thanks, > Naoya Horiguchi > >> --- >> mm/page_alloc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index fefc4074d9d0..88c5f5aea9b0 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6505,6 +6505,7 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, >> next_page = page; >> current_buddy = page + size; >> } >> + page = next_page; >> >> if (set_page_guard(zone, current_buddy, high, migratetype)) >> continue; >> @@ -6512,7 +6513,6 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, >> if (current_buddy != target) { >> add_to_free_list(current_buddy, zone, high, migratetype); >> set_buddy_order(current_buddy, high); >> - page = next_page; >> } >> } >> } >> -- >> 2.30.0 >> >> >> >