On Mon, Aug 03, 2020 at 11:19:09AM -0400, Qian Cai wrote: > On Mon, Aug 03, 2020 at 01:36:58PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > > Hello, > > > > On Mon, Aug 03, 2020 at 08:39:55AM -0400, Qian Cai wrote: > > > On Fri, Jul 31, 2020 at 12:20:56PM +0000, nao.horiguchi@xxxxxxxxx wrote: > > > > This patchset is the latest version of soft offline rework patchset > > > > targetted for v5.9. > > > > > > > > Main focus of this series is to stabilize soft offline. Historically soft > > > > offlined pages have suffered from racy conditions because PageHWPoison is > > > > used to a little too aggressively, which (directly or indirectly) invades > > > > other mm code which cares little about hwpoison. This results in unexpected > > > > behavior or kernel panic, which is very far from soft offline's "do not > > > > disturb userspace or other kernel component" policy. > > > > > > > > Main point of this change set is to contain target page "via buddy allocator", > > > > where we first free the target page as we do for normal pages, and remove > > > > from buddy only when we confirm that it reaches free list. There is surely > > > > race window of page allocation, but that's fine because someone really want > > > > that page and the page is still working, so soft offline can happily give up. > > > > > > > > v4 from Oscar tries to handle the race around reallocation, but that part > > > > seems still work in progress, so I decide to separate it for changes into > > > > v5.9. Thank you for your contribution, Oscar. > > > > > > > > The issue reported by Qian Cai is fixed by patch 16/16. > > > > > > I am still getting EIO everywhere on next-20200803 (which includes this v5). > > > > > > # ./random 1 > > > - start: migrate_huge_offline > > > - use NUMA nodes 0,8. > > > - mmap and free 8388608 bytes hugepages on node 0 > > > - mmap and free 8388608 bytes hugepages on node 8 > > > madvise: Input/output error > > > > > > From the serial console, > > > > > > [ 637.164222][ T8357] soft offline: 0x118ee0: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head) > > > [ 637.164890][ T8357] Soft offlining pfn 0x20001380 at process virtual address 0x7fff9f000000 > > > [ 637.165422][ T8357] Soft offlining pfn 0x3ba00 at process virtual address 0x7fff9f200000 > > > [ 637.166409][ T8357] Soft offlining pfn 0x201914a0 at process virtual address 0x7fff9f000000 > > > [ 637.166833][ T8357] Soft offlining pfn 0x12b9a0 at process virtual address 0x7fff9f200000 > > > [ 637.168044][ T8357] Soft offlining pfn 0x1abb60 at process virtual address 0x7fff9f000000 > > > [ 637.168557][ T8357] Soft offlining pfn 0x20014820 at process virtual address 0x7fff9f200000 > > > [ 637.169493][ T8357] Soft offlining pfn 0x119720 at process virtual address 0x7fff9f000000 > > > [ 637.169603][ T8357] soft offline: 0x119720: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head) > > > [ 637.169756][ T8357] Soft offlining pfn 0x118ee0 at process virtual address 0x7fff9f200000 > > > [ 637.170653][ T8357] Soft offlining pfn 0x200e81e0 at process virtual address 0x7fff9f000000 > > > [ 637.171067][ T8357] Soft offlining pfn 0x201c5f60 at process virtual address 0x7fff9f200000 > > > [ 637.172101][ T8357] Soft offlining pfn 0x201c8f00 at process virtual address 0x7fff9f000000 > > > [ 637.172241][ T8357] __get_any_page: 0x201c8f00: unknown zero refcount page type 87fff8000000000 > > > > I might misjudge to skip the following patch, sorry about that. > > Could you try with it? > > Still getting EIO after applied this patch. > > [ 1215.499030][T88982] soft offline: 0x201bdc20: hugepage isolation failed: 0, page count 2, type 87fff800001000e (referenced|uptodate|dirty|head) > [ 1215.499775][T88982] Soft offlining pfn 0x201bdc20 at process virtual address 0x7fff91a00000 > [ 1215.500189][T88982] Soft offlining pfn 0x201c19c0 at process virtual address 0x7fff91c00000 > [ 1215.500297][T88982] soft offline: 0x201c19c0: hugepage isolation failed: 0, page count 2, type 87fff800001000e (referenced|uptodate|dirty|head) > [ 1215.500982][T88982] Soft offlining pfn 0x1f1fa0 at process virtual address 0x7fff91a00000 > [ 1215.501086][T88982] soft offline: 0x1f1fa0: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head) > [ 1215.501237][T88982] Soft offlining pfn 0x1f4520 at process virtual address 0x7fff91c00000 > [ 1215.501355][T88982] soft offline: 0x1f4520: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head) > [ 1215.502196][T88982] Soft offlining pfn 0x1f4520 at process virtual address 0x7fff91a00000 > [ 1215.502573][T88982] Soft offlining pfn 0x1f1fa0 at process virtual address 0x7fff91c00000 > [ 1215.502687][T88982] soft offline: 0x1f1fa0: hugepage isolation failed: 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head) > [ 1215.503245][T88982] Soft offlining pfn 0x201c3cc0 at process virtual address 0x7fff91a00000 > [ 1215.503594][T88982] Soft offlining pfn 0x201c3ce0 at process virtual address 0x7fff91c00000 > [ 1215.503755][T88982] __get_any_page: 0x201c3ce0: unknown zero refcount page type 87fff8000000000 So I lean to cancel patch 3/16 where madvise_inject_error() releases page refcount before calling soft_offline_page(). That should solve the issue because refcount of the target page never becomes zero. Patch 4/16, 8/16 and 9/16 depend on it, so they should be removed too. I'll resubmit the series later, but I prepare the delta below (based on next-20200805), so it would be appreciated if you can run random.c with it to confrim the fix. Thanks, Naoya Horiguchi --- >From 533090c0869aeca88d8ff14d27c3cef8fc060ccd Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Date: Thu, 6 Aug 2020 01:29:08 +0900 Subject: [PATCH] (for testing) revert change around removing MF_COUNT_INCREASED revert the following patches - mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED - mm,madvise: Refactor madvise_inject_error - mm,hwpoison: remove MF_COUNT_INCREASED - mm,hwpoison: remove flag argument from soft offline functions Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> --- drivers/base/memory.c | 2 +- include/linux/mm.h | 9 +++++---- mm/madvise.c | 36 +++++++++++++++++++----------------- mm/memory-failure.c | 31 ++++++++++++++++++++----------- 4 files changed, 45 insertions(+), 33 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 3e6d27c9dff6..4db3c660de83 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -463,7 +463,7 @@ static ssize_t soft_offline_page_store(struct device *dev, if (kstrtoull(buf, 0, &pfn) < 0) return -EINVAL; pfn >>= PAGE_SHIFT; - ret = soft_offline_page(pfn); + ret = soft_offline_page(pfn, 0); return ret == 0 ? count : ret; } diff --git a/include/linux/mm.h b/include/linux/mm.h index 4f12b2465e80..442921a004a2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2976,9 +2976,10 @@ void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long nr_pages); enum mf_flags { - MF_ACTION_REQUIRED = 1 << 0, - MF_MUST_KILL = 1 << 1, - MF_SOFT_OFFLINE = 1 << 2, + MF_COUNT_INCREASED = 1 << 0, + MF_ACTION_REQUIRED = 1 << 1, + MF_MUST_KILL = 1 << 2, + MF_SOFT_OFFLINE = 1 << 3, }; extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); @@ -2988,7 +2989,7 @@ extern int sysctl_memory_failure_early_kill; extern int sysctl_memory_failure_recovery; extern void shake_page(struct page *p, int access); extern atomic_long_t num_poisoned_pages __read_mostly; -extern int soft_offline_page(unsigned long pfn); +extern int soft_offline_page(unsigned long pfn, int flags); /* diff --git a/mm/madvise.c b/mm/madvise.c index 843f6fad3b89..5fa5f66468b3 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -886,15 +886,16 @@ static long madvise_remove(struct vm_area_struct *vma, static int madvise_inject_error(int behavior, unsigned long start, unsigned long end) { + struct page *page; struct zone *zone; unsigned long size; if (!capable(CAP_SYS_ADMIN)) return -EPERM; + for (; start < end; start += size) { unsigned long pfn; - struct page *page; int ret; ret = get_user_pages_fast(start, 1, 0, &page); @@ -909,26 +910,27 @@ static int madvise_inject_error(int behavior, */ size = page_size(compound_head(page)); - /* - * The get_user_pages_fast() is just to get the pfn of the - * given address, and the refcount has nothing to do with - * what we try to test, so it should be released immediately. - * This is racy but it's intended because the real hardware - * errors could happen at any moment and memory error handlers - * must properly handle the race. - */ - put_page(page); - if (behavior == MADV_SOFT_OFFLINE) { pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n", - pfn, start); - ret = soft_offline_page(pfn); - } else { - pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", - pfn, start); - ret = memory_failure(pfn, 0); + pfn, start); + + ret = soft_offline_page(pfn, MF_COUNT_INCREASED); + if (ret) + return ret; + continue; } + pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", + pfn, start); + + /* + * Drop the page reference taken by get_user_pages_fast(). In + * the absence of MF_COUNT_INCREASED the memory_failure() + * routine is responsible for pinning the page to prevent it + * from being released back to the page allocator. + */ + put_page(page); + ret = memory_failure(pfn, 0); if (ret) return ret; } diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a229d4694954..fd50a6f9a60d 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1167,7 +1167,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) num_poisoned_pages_inc(); - if (!get_hwpoison_page(p)) { + if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { /* * Check "filter hit" and "race with other subpage." */ @@ -1363,7 +1363,7 @@ int memory_failure(unsigned long pfn, int flags) * In fact it's dangerous to directly bump up page count from 0, * that may make page_ref_freeze()/page_ref_unfreeze() mismatch. */ - if (!get_hwpoison_page(p)) { + if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { if (is_free_buddy_page(p)) { action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); return 0; @@ -1392,7 +1392,10 @@ int memory_failure(unsigned long pfn, int flags) shake_page(p, 0); /* shake_page could have turned it free. */ if (!PageLRU(p) && is_free_buddy_page(p)) { - action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED); + if (flags & MF_COUNT_INCREASED) + action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); + else + action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED); return 0; } @@ -1540,7 +1543,7 @@ static void memory_failure_work_func(struct work_struct *work) if (!gotten) break; if (entry.flags & MF_SOFT_OFFLINE) - soft_offline_page(entry.pfn); + soft_offline_page(entry.pfn, entry.flags); else memory_failure(entry.pfn, entry.flags); } @@ -1679,10 +1682,13 @@ EXPORT_SYMBOL(unpoison_memory); * that is not free, and 1 for any other page type. * For 1 the page is returned with increased page count, otherwise not. */ -static int __get_any_page(struct page *p, unsigned long pfn) +static int __get_any_page(struct page *p, unsigned long pfn, int flags) { int ret; + if (flags & MF_COUNT_INCREASED) + return 1; + /* * When the target page is a free hugepage, just remove it * from free hugepage list. @@ -1709,12 +1715,12 @@ static int __get_any_page(struct page *p, unsigned long pfn) return ret; } -static int get_any_page(struct page *page, unsigned long pfn) +static int get_any_page(struct page *page, unsigned long pfn, int flags) { - int ret = __get_any_page(page, pfn); + int ret = __get_any_page(page, pfn, flags); if (ret == -EBUSY) - ret = __get_any_page(page, pfn); + ret = __get_any_page(page, pfn, flags); if (ret == 1 && !PageHuge(page) && !PageLRU(page) && !__PageMovable(page)) { @@ -1727,7 +1733,7 @@ static int get_any_page(struct page *page, unsigned long pfn) /* * Did it turn free? */ - ret = __get_any_page(page, pfn); + ret = __get_any_page(page, pfn, 0); if (ret == 1 && !PageLRU(page)) { /* Drop page reference which is from __get_any_page() */ put_page(page); @@ -1869,6 +1875,7 @@ static int soft_offline_free_page(struct page *page) /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline + * @flags: flags. Same as memory_failure(). * * Returns 0 on success, otherwise negated errno. * @@ -1887,7 +1894,7 @@ static int soft_offline_free_page(struct page *page) * This is not a 100% solution for all memory, but tries to be * ``good enough'' for the majority of memory. */ -int soft_offline_page(unsigned long pfn) +int soft_offline_page(unsigned long pfn, int flags) { int ret; struct page *page; @@ -1901,11 +1908,13 @@ int soft_offline_page(unsigned long pfn) if (PageHWPoison(page)) { pr_info("soft offline: %#lx page already poisoned\n", pfn); + if (flags & MF_COUNT_INCREASED) + put_page(page); return 0; } get_online_mems(); - ret = get_any_page(page, pfn); + ret = get_any_page(page, pfn, flags); put_online_mems(); if (ret > 0) -- 2.25.1