on 9/26/2023 7:33 PM, Naoya Horiguchi wrote: > On Wed, Aug 30, 2023 at 02:27:33PM +0800, Kemeng Shi wrote: >> >> >> on 8/28/2023 11:21 PM, Naoya Horiguchi wrote: >>> On Sat, Aug 26, 2023 at 11:47:43PM +0800, Kemeng Shi wrote: >>>> When guard page debug is enabled and set_page_guard returns success, we >>>> miss to forward page to point to start of next split range and we will do >>>> split unexpectedly in page range without target page. Move start page >>>> update before set_page_guard to fix this. >>>> >>>> As we split to wrong target page, then splited pages are not able to merge >>>> back to original order when target page is put back and splited pages >>>> except target page is not usable. To be specific: >>>> >>>> Consider target page is the third page in buddy page with order 2. >>>> | buddy-2 | Page | Target | Page | >>>> >>>> After break down to target page, we will only set first page to Guard >>>> because of bug. >>>> | Guard | Page | Target | Page | >>>> >>>> When we try put_page_back_buddy with target page, the buddy page of target >>>> if neither guard nor buddy, Then it's not able to construct original page >>>> with order 2 >>>> | Guard | Page | buddy-0 | Page | >>>> >>>> All pages except target page is not in free list and is not usable. >>>> >>>> Fixes: 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages") >>>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> >>> >>> Thank you for finding the problem and writing patches. I think the patch >>> fixes the reported problem, But I wonder that we really need guard page >>> mechanism in break_down_buddy_pages() which is only called from memory_failure. >>> As stated in Documentation/admin-guide/kernel-parameters.txt, this is a >>> debugging feature to detect memory corruption due to buggy kernel or drivers >>> code. So if HW memory failrue seems to be out of the scope, and I feel that >>> we could simply remove it from break_down_buddy_pages(). >>> >>> debug_guardpage_minorder= >>> [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this >>> parameter allows control of the order of pages that will >>> be intentionally kept free (and hence protected) by the >>> buddy allocator. Bigger value increase the probability >>> of catching random memory corruption, but reduce the >>> amount of memory for normal system use. The maximum >>> possible value is MAX_ORDER/2. Setting this parameter >>> to 1 or 2 should be enough to identify most random >>> memory corruption problems caused by bugs in kernel or >>> driver code when a CPU writes to (or reads from) a >>> random memory location. Note that there exists a class >>> of memory corruptions problems caused by buggy H/W or >>> F/W or by drivers badly programming DMA (basically when >>> memory is written at bus level and the CPU MMU is >>> bypassed) which are not detectable by >>> CONFIG_DEBUG_PAGEALLOC, hence this option will not help >>> tracking down these problems. >>> >>> If you have any idea about how guard page mechanism helps memory_failrue, >>> could you share it? >>> >> Hi Naoya, thanks for feedback. Commit c0a32fc5a2e47 ("mm: more intensive >> memory corruption debugging") menthioned we konw that with >> CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception on >> access (read,write) to an unallocated page, which permits us to catch code >> which corrupts memory; Guard page aims to keep more free/protected pages >> and to interlace free/protected and allocated pages to increase the >> probability of catching corruption. Keep guard page around failrue looks >> helpful to catch random access. Wish this can help. > > Sorry for my late response. > I'm OK with keeping guardpage stuff in this code path as long as it properly works. > And the patch looks good to me. > > Acked-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Do you think of sending this patch (only patch 1/3) to -stable? > If so, please add "Cc: stable@xxxxxxxxxxxxxxx" tag. > Thanks for reply. Will cc stable in next version. Thanks! > Thanks, > Naoya Horiguchi >