On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote: > > > On 6/5/22 02:56, Andrew Morton wrote: > > On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <pizhenwei@xxxxxxxxxxxxx> wrote: > > > > > Currently unpoison_memory(unsigned long pfn) is designed for soft > > > poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page > > > puts page back buddy only, this leads BUG during accessing on the > > > corrupted KPTE. Thank you for the patch. I think this will be helpful for integration testing. You mention "hardware corrupted page" as the condition of this bug, and I think that it means a real hardware error, but this BUG seems to be triggered when we use mce-inject or APEI (these are also software injection without corrupting the memory physically). So the actual condition is "when memory_failure() is called by MCE handler"? > > > > > > Do not allow to unpoison hardware corrupted page in unpoison_memory() > > > to avoid BUG like this: > > > > > > Unpoison: Software-unpoisoned page 0x61234 > > > BUG: unable to handle page fault for address: ffff888061234000 > > > > Thanks. > > > > > --- a/mm/memory-failure.c > > > +++ b/mm/memory-failure.c > > > @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn) > > > { > > > struct page *page; > > > struct page *p; > > > + pte_t *kpte; > > > int ret = -EBUSY; > > > int freeit = 0; > > > static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, > > > @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn) > > > p = pfn_to_page(pfn); > > > page = compound_head(p); > > > + kpte = virt_to_kpte((unsigned long)page_to_virt(p)); > > > + if (kpte && !pte_present(*kpte)) { > > > + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n", > > > + pfn, &unpoison_rs); This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages, where I see the similar BUG as follows (even with applying your patch): [ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000 [ 917.810144] #PF: supervisor write access in kernel mode [ 917.812588] #PF: error_code(0x0002) - not-present page [ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062 [ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI [ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47 [ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10 [ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c [ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246 [ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000 [ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000 [ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000 [ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8 [ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000 [ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0 [ 917.863433] Call Trace: [ 917.864266] <TASK> [ 917.864961] clear_huge_page+0x147/0x270 [ 917.866236] hugetlb_fault+0x440/0xad0 [ 917.867366] handle_mm_fault+0x270/0x290 [ 917.868532] do_user_addr_fault+0x1c3/0x680 [ 917.869768] exc_page_fault+0x6c/0x160 [ 917.870912] ? asm_exc_page_fault+0x8/0x30 [ 917.872082] asm_exc_page_fault+0x1e/0x30 [ 917.873220] RIP: 0033:0x7f2aeb8ba367 I don't think of a workaround for this now ... > > > + return -EPERM; Is -EOPNOTSUPP a better error code? > > > + } > > > + > > > mutex_lock(&mf_mutex); > > > if (!PageHWPoison(p)) { > > > > I guess we don't want to let fault injection crash the kernel, so a > > cc:stable seems appropriate here. > > > > Can we think up a suitable Fixes: commit? I'm suspecting this bug has > > been there for a long time? > > > > Sure! > > 2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit: > 847ce401df392("HWPOISON: Add unpoisoning support") > ... > There is no hardware level unpoisioning, so this cannot be used for real > memory errors, only for software injected errors. > ... > > We can find that this function should be used for software level unpoisoning > only in both commit log and comment in source code. unfortunately there is > no check in function hwpoison_unpoison(). > > > 2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole > page is affected and poisoned") > > This clears KPTE, and leads BUG(described in this patch) during unpoisoning > the hardware corrupted page. > > > Fixes: 847ce401df392("HWPOISON: Add unpoisoning support") > Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page > is affected and poisoned") > > Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx> > Cc: Tony Luck <tony.luck@xxxxxxxxx>. Thanks for checking the history, I agree with sending to stable. Thanks, Naoya Horiguchi