在 2022/6/16 上午12:15, Xianting Tian 写道:
The contents of the invalid page as below, all 0 in contents, but it may diff next time.在 2022/6/15 下午9:55, Zi Yan 写道:On 15 Jun 2022, at 2:47, Xianting Tian wrote:Can you elaborate more on this? What kind of page can lead to page_is_buddy() returning true? You said it is buddy_pfn is 0, but if the page is reserved,在 2022/6/14 上午8:14, Zi Yan 写道:On 13 Jun 2022, at 19:47, Guo Ren wrote:On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@xxxxxxxxxx> wrote:On 13 Jun 2022, at 12:32, Guo Ren wrote:This is quite misleading. Commit 787af64d05cd applies does not mean it isOn Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@xxxxxxxxxx> wrote:Hi Xianting, Thanks for your patch. On 13 Jun 2022, at 9:10, Xianting Tian wrote:Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.") added buddy check code. But unfortunately, this fix isn't backported to linux-5.17.y and the former stable branches. The reason is it added wrongNo, the Fixes tag is right. The commit above does need to validate buddy.fixes message:Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackablepageblocks with others")I think Xianting is right. The “Fixes:" tag is not accurate and the page_is_buddy() is necessary here. This patch could be applied to the early version of the stable tree (eg: Linux-5.10.y, not the master tree)intended to fix the preexisting bug. Also it does not apply cleanlyto commit d9dddbf55667, there is a clear indentation mismatch. At best, you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667. There is no way you can apply 787af64d05cd to earlier trees and call it a day.You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying the fixes message is wrong just misleads people, making them think there isno bug in 1dd214b8f21c. We need to be clear about this.First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The origin fixes could cover the Linux-5.0.y tree if they give the accurate commit number and that is the cause we want to point out.Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixesthe issue introduced by d9dddbf55667. But my point is that 787af64d05cdis not intended to fix d9dddbf55667 and saying it has a wrong fixes message is misleading. This is the point I want to make.Second, if the patch is for d9dddbf55667 then it could cover any tree in the stable repo. Actually, we only know Linux-5.10.y has the problem.But it is not and does not apply to d9dddbf55667 cleanly.Maybe, Gregkh could help to direct us on how to deal with the issue: (Fixup a bug which only belongs to the former stable branch.)I think you just need to send this patch without saying “commit 787af64d05cd fixes message is wrong” would be a good start. You alsoneed extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17(inclusive). So there will need to be two patches: 1) your patch to stable tree prior to 5.15 and2) your patch with an additional mm/page_isolation.c fix to stable treebetween 5.15 and 5.17.Also, you will need to fix the mm/page_isolation.c code too to make this patch complete, unless you can show that PFN=0x1000 is never going to be encounteredNo, we needn't fix mm/page_isolation.c in linux-5.10.y, because it hadin the mm/page_isolation.c code I mentioned below.pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent buddy_pfn=0. The root cause comes from __find_buddy_pfn(): return page_pfn ^ (1 << order);Right. But pfn_valid_within() was removed since 5.15. So your fix is required for kernels between 5.15 and 5.17 (inclusive).When page_pfn is the same as the order size, it will return the previous buddy not the next. That is the only exception for this algorithm, right? In fact, the bug is a very long time to reproduce and is not easy to debug, so we want to contribute it to the community to prevent other guys from wasting time. Although there is no new patch at all.Thanks for your reporting and sending out the patch. I really appreciate it. We definitely need your inputs. Throughout the email thread, I am trying to help you clarify the bug and how to fix it properly: 1. The commit 787af64d05cd does not apply cleanly to commits d9dddbf55667, meaning you cannot just cherry-pick that commit to fix the issue. That is why we need your patch to fix the issue. And saying it has a wrong fixes message in this patch’s git log is misleading. 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix to mm/page_isolation.c is also needed, since pfn_valid_within() was removed since 5.15 and the issue can appear during page isolation. 3. For kernels before 5.15, this patch will apply.Zi Yan, Guo Ren,I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false, but it still may return true with very low probablity.if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false. Maybe show us the dump_page() that offending page. Thanks.Let‘s take the issue we met on RISC-V arch for example,pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 0xffffffe07e205000, which is the page address of PFN 512.__find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.Then it use below code to get buddy page with buddy_pfn 0: buddy = page + (buddy_pfn - pfn); So buddy page address is: 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000), actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000 as the page's address to extract the value of a member in 'struct page', we may get an uncertain value. That's why I say page_is_buddy() may return true with very low probablity.So I think we need to add the code the verify buddy_pfn in the first place:pfn_valid(buddy_pfn)
(gdb) p /x *(struct page *)0xffffffe07e1fe000$15 = {flags = 0x0, {{lru = {next = 0x0, prev = 0x0}, mapping = 0x0, index = 0x0, private = 0x0}, {dma_addr = {0x0, 0x0}}, {{slab_list = {next = 0x0, prev = 0 x0}, {next = 0x0, pages = 0x0, pobjects = 0x0}}, slab_cache = 0x0, freelist = 0x0, {s_mem = 0x0, counters = 0x0, {inuse = 0x0, objects = 0x0 , frozen = 0x0}}}, {compound_head = 0x0, compound_dtor = 0x0, compound_order = 0x0, compound_mapcount = {counter = 0x0}, compound_nr = 0x0}, {_compound_pad_1 = 0x0, hpage_pinned_refcount = {counter = 0x0}, deferred_list = {next = 0x0, prev = 0x0}}, {_pt_pad_1 = 0x0, pmd_hu ge_pte = 0x0, _pt_pad_2 = 0x0, {pt_mm = 0x0, pt_frag_refcount = {counter = 0x0}}, ptl = {{rlock = {raw_lock = {lock = 0x0}}}}}, {pgmap = 0x0, zone _device_data = 0x0}, callback_head = {next = 0x0, func = 0x0}}, {_mapcount = {counter = 0x0}, page_type = 0x0, active = 0x0, units = 0x0}, _refcount
= {counter = 0x0}} This is the first normal page in mem_map, (gdb) p /x *(struct page *)0xffffffe07e26e000$16 = {flags = 0x0, {{lru = {next = 0x100, prev = 0x122}, mapping = 0x0, index = 0x0, private = 0x0}, {dma_addr = {0x100, 0x122}}, {{slab_list = {next = 0x100 , prev = 0x122}, {next = 0x100, pages = 0x122, pobjects = 0x0}}, slab_cache = 0x0, freelist = 0x0, {s_mem = 0x0, counters = 0x0, {inuse = 0x0, objects = 0x0, frozen = 0x0}}}, {compound_head = 0x100, compound_dtor = 0x22, compound_order = 0x1, compound_mapcount = {counter = 0x0} , compound_nr = 0x0}, {_compound_pad_1 = 0x100, hpage_pinned_refcount = {counter = 0x122}, deferred_list = {next = 0x0, prev = 0x0}}, {_ pt_pad_1 = 0x100, pmd_huge_pte = 0x122, _pt_pad_2 = 0x0, {pt_mm = 0x0, pt_frag_refcount = {counter = 0x0}}, ptl = {{rlock = {raw_lock = {lock = 0x0}}}} }, {pgmap = 0x100, zone_device_data = 0x122}, callback_head = {next = 0x100, func = 0x122}}, {_mapcount = {counter = 0xffffffff}, page_type = 0xffffffff, active = 0xffffffff, units = 0xffffffff}, _refcount = {counter = 0x0}}
I think we need to add some code to verify buddy_pfn in the first place.Could you give some suggestions for this idea?It seems that the RISC-V arch reveals a similar bug from d9dddbf55667. Basically, this bug will only happen when PFN=0x2000 is merging up andActually, this issue is involved by commit:commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,but it got buddy PFN 0 for PFN 0x2000: 0 = 0x2000 ^ (1 << 12)With the illegal buddy PFN 0, it got an illegal buddy page, which causedcrash in __get_pfnblock_flags_mask().there are some isolated pageblocks.Not PFN=0x2000, it's PFN=0x1000, I guess. RISC-V's first 2MB RAM could reserve for opensbi, so it would have riscv_pfn_base=512 and mem_map began with 512th PFN when CONFIG_FLATMEM=y.(Also, csky has the same issue: a non-zero pfn_base in some scenarios.)But __find_buddy_pfn algorithm thinks the start address is 0, it couldget 0 pfn or less than the pfn_base value. We need another check to prevent that.BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB areset to PageReserved?With the patch, it can avoid the calling of get_pageblock_migratetype() ifYou might miss the __find_buddy_pfn() caller in unset_migratetype_isolate() from mm/page_isolation.c, if you are talking about linux-5.17.y and former version. There, page_is_buddy() is also not called and is_migrate_isolate_page()it isn't buddy page.is called, which calls get_pageblock_migratetype() too.Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")Cc: stable@xxxxxxxxxxxxxxx Reported-by: zjb194813@xxxxxxxxxxxxxxx Reported-by: tianhu.hh@xxxxxxxxxxxxxxx Signed-off-by: Xianting Tian <xianting.tian@xxxxxxxxxxxxxxxxx> --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b1caa1c6c887..5b423caa68fd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c@@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,buddy_pfn = __find_buddy_pfn(pfn, order); buddy = page + (buddy_pfn - pfn); + + if (!page_is_buddy(page, buddy, order)) + goto done_merging;buddy_mt = get_pageblock_migratetype(buddy);if (migratetype != buddy_mt -- 2.17.1-- Best Regards, Yan, Zi-- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/-- Best Regards, Yan, Zi-- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/-- Best Regards, Yan, Zi-- Best Regards, Yan, Zi