Re: [PATCH v3 0/5] HWpoison: further fixes and cleanups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-09-16 18:30, osalvador@xxxxxxx wrote:
On 2020-09-16 16:46, Aristeu Rozanski wrote:
Hi Oscar,

On Wed, Sep 16, 2020 at 04:09:30PM +0200, Oscar Salvador wrote:
On Wed, Sep 16, 2020 at 09:53:58AM -0400, Aristeu Rozanski wrote:
Can you try the other patch I posted in response to Naoya?

Same thing:

[  369.195056] Soft offlining pfn 0x3fb5bf at process virtual address
0x7ffc84350000
[  369.195073] page:000000002bb131e4 refcount:1 mapcount:0
mapping:0000000000000000 index:0x7ffc8435 pfn:0x3fb5bf
[  369.195080] anon flags:
0x3ffff80008000e(referenced|uptodate|dirty|swapbacked)
[  369.202131] raw: 003ffff80008000e 5deadbeef0000100 5deadbeef0000122
c000003fda1c7431
[  369.202137] raw: 000000007ffc8435 0000000000000000 00000001ffffffff
c000003fd63af000
[  369.202141] page dumped because: page_handle_poison
[  369.202145] page->mem_cgroup:c000003fd63af000
[  369.215055] page_handle_poison: hugepage_or_freepage failed�n
[  369.215057] __soft_offline_page: page_handle_poison -EBUSY
[  369.215068] page:000000002bb131e4 refcount:3 mapcount:0
mapping:00000000f6ca3f32 index:0x5c pfn:0x3fb5bf
[  369.215110] aops:xfs_address_space_operations [xfs] ino:49f9c5f
dentry name:"messages"
[  369.215117] flags: 0x3ffff800002008(dirty|private)
[  369.215121] raw: 003ffff800002008 5deadbeef0000100 5deadbeef0000122
c000003fadd3daa8
[  369.215127] raw: 000000000000005c c000003fd9497c20 00000003ffffffff
c000003fd1143000
[  369.215132] page dumped because: __soft_offline_page after migrate
[  369.215136] page->mem_cgroup:c000003fd1143000


Ok, this is something different.
The race you saw previously is kinda normal as there is a race window
between spotting a freepage and taking it off the buddy freelists.
The retry patch should help there.

The issue you are seeing right here is due to the call to
page_handle_poison in __soft_offline_page being wrong, as we pass
hugepage_or_freepage = true inconditionally, which is wrong.

Should be:

Fat fingers, sorry:

Ok, this is something different.
The race you saw previously is kinda normal as there is a race window between spotting a freepage and taking it off the buddy freelists.
The retry patch should help there.

The issue you are seeing right here is due to the call to page_handle_poison in __soft_offline_page being wrong, as we pass hugepage_or_freepage = true inconditionally, which is wrong.
I think it was caused during rebasing.

Should be:

@@ -1858,8 +1903,11 @@ static int __soft_offline_page(struct page *page)
                if (!ret) {
                        bool release = !huge;

-                       if (!page_handle_poison(page, true, release))
+                       if (!page_handle_poison(page, huge, release)) {
+ pr_info("%s: page_handle_poison -EBUSY\n", __func__); + dump_page(page, "__soft_offline_page after migrate");
                                ret = -EBUSY;
+                       }

Could you try that on top please?

I am away from my laptop now but I will be taking a look later today.

thanks





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux