Re: [PATCH] codetag: debug: mark codetags for pages which transitioned from being poison to unpoison as empty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Miaohe


On 8/23/24 15:40, Miaohe Lin wrote:
On 2024/8/23 11:37, Hao Ge wrote:
Hi Suren and Miaohe


On 8/23/24 09:47, Hao Ge wrote:
Hi Suren and Miaohe


Thank you all for taking the time to discuss this issue.


On 8/23/24 06:50, Suren Baghdasaryan wrote:
On Thu, Aug 22, 2024 at 2:46 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
Hi Miaohe


Thank you for taking the time to review this patch.


On 8/22/24 16:04, Miaohe Lin wrote:
On 2024/8/22 10:58, Hao Ge wrote:
From: Hao Ge <gehao@xxxxxxxxxx>

Thanks for your patch.

The PG_hwpoison page will be caught and isolated on the entrance to
the free buddy page pool. so,when we clear this flag and return it
to the buddy system,mark codetags for pages as empty.

Is below scene cause the problem?

1. Pages are allocated. pgalloc_tag_add() will be called when prep_new_page().

2. Pages are hwpoisoned. memory_failure() will set PG_hwpoison flag and pgalloc_tag_sub()
will be called when pages are caught and isolated on the entrance to buddy.
Hi Folks,
Thanks for reporting this! Could you please describe in more details
how memory_failure() ends up calling pgalloc_tag_sub()? It's not
obvious to me which path leads to pgalloc_tag_sub(), so I must be
missing something.

OK,Let me describe the scenario I encountered.

In the Link [1] I mentioned,here is the logic behind it:

It performed the following operations:

madvise(ptrs[num_alloc], pagesize, MADV_SOFT_OFFLINE)

and then the kernel's call stack looks like this:

do_madvise

soft_offline_page

page_handle_poison

__folio_put

free_unref_page

I just reviewed it and I think I missed a stack.

Actually, it's like this

do_madvise

soft_offline_page

soft_offline_in_use_page

page_handle_poison

__folio_put

free_unref_page


And I've come up with a minimal solution. If everyone agrees, I'll send the patch.look this

https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/page_alloc.c#L1056

Let's directly call clear_page_tag_ref after pgalloc_tag_sub.
I tend to agree with you. It should be a good practice to call clear_page_tag_ref()
whenever page_tag finished its work. Do you think below code is also needed?

Actually, this is not necessary,It follows the normal logic of allocation and release.

The actual intention of the clear_page_tag_reffunction is to indicate to thealloc_tag  that the page is not being returned to the

buddy system through normal allocation.

Just like when the page first enters the buddy system,

https://elixir.bootlin.com/linux/v6.11-rc4/source/mm/mm_init.c#L2464

So, can you help review this patch?

https://lore.kernel.org/all/20240823062002.21165-1-hao.ge@xxxxxxxxx/


diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index de54c3567539..707710f03cf5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1104,6 +1104,7 @@ __always_inline bool free_pages_prepare(struct page *page,
         reset_page_owner(page, order);
         page_table_check_free(page, order);
         pgalloc_tag_sub(page, 1 << order);
+       clear_page_tag_ref(page);

         if (!PageHighMem(page)) {
                 debug_check_no_locks_freed(page_address(page),

Thanks.
.




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux