On Wed, 24 Feb 2021 11:31:55 +0100 Oscar Salvador <osalvador@xxxxxxx> wrote: > I have some questions: > > > 1.When LCME is enabled, and there are two processes A && B running on > > different core X && Y separately, which will access one same page, then > > the page corrupted when process A access it, a MCE will be rasied to > > core X and the error process is just underway. > > When !LMCE, that is not a problem because new MCE needs to wait for the ongoing MCE? I am not sure whether this case will happen when !LMCE, when I realized this place may be an issue I tried to reproduce it and my configuration is LMCE enabled. > > 2.Then B access the page and trigger another MCE to core Y, it will also > > do error process, it will see TestSetPageHWPoison be true, and 0 is > > returned. > > For non-nested calls, that is no problem because the page will be taken out > of business(unmapped from the processes), right? So no more MCE are possible. Yes, I think after the recovery jod is finished, other processes still access the page will meet a page fault and error will be returned; > > > > 3.The kill_me_maybe will check the return: > > > > 1244 static void kill_me_maybe(struct callback_head *cb) > > 1245 { > > > > 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && > > 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { > > 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, > > So, IIUC, in case of a LMCE nested call, the second MCE will reach here. > set_mce_nospec() will either mark the underlying page as not mapped/cached. > This set_mce_nospec() is not proper when the recovery job is on the fly. In my test this function failed. > Should not have memory_failure()->hwpoison_user_mappings() unmapped the page > from both process A and B? Or this is in case the ongoing MCE(process A) has > not still unmapped anything, so process B can still access this page. > What I care is the process B triggered the error again after process A, I don't know how it return and proceed. > So with your change, process B will be sent a SIGBUG, while process A is still > handling the MCE, right? Right! > > p->mce_whole_page); > > 1257 sync_core(); > > 1258 return; > > 1259 } > > > > 1267 } > > > > 4. The error process for B will end, and may nothing happened if > > kill-early is not set, We may let the wrong data go into effect. > > > > For other cases which care the return value of memory_failure() should > > check why they want to process a memory error which have already been > > processed. This behavior seems reasonable. > > > > In kill_me_maybe, log the fact about the memory may not recovered, and > > we will kill the related process. > > > > Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx> > > --- > > arch/x86/kernel/cpu/mce/core.c | 2 ++ > > mm/memory-failure.c | 4 ++-- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > index e133ce1e562b..db4afc5bf15a 100644 > > --- a/arch/x86/kernel/cpu/mce/core.c > > +++ b/arch/x86/kernel/cpu/mce/core.c > > @@ -1259,6 +1259,8 @@ static void kill_me_maybe(struct callback_head *cb) > > } > > > > if (p->mce_vaddr != (void __user *)-1l) { > > + pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", > > + p->mce_addr >> PAGE_SHIFT, p->comm, p->pid); > > force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); > > } else { > > pr_err("Memory error not recovered"); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index e9481632fcd1..06f006174b8c 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1224,7 +1224,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > if (TestSetPageHWPoison(head)) { > > pr_err("Memory failure: %#lx: already hardware poisoned\n", > > pfn); > > - return 0; > > + return -EBUSY; > > As David said, madvise_inject_error() will start returning -EBUSY now in case > we madvise(MADV_HWPOISON) on an already hwpoisoned page. > > AFAICS, memory_failure() can return 0, -Eerrors, and MF_XXX. > Would it make sense to unify that? That way we could declare error codes that > make somse sense (like MF_ALREADY_HWPOISONED). > @David: I checked the code again, and find a few places will care the exact return value, like: 1: drivers/base/memory.c:483: ret = memory_failure(pfn, 0); This is for hard page offline, I see the code in mcelog: static void offline_action(struct mempage *mp, u64 addr) { if (offline <= OFFLINE_ACCOUNT) return; Lprintf("Offlining page %llx\n", addr); if (memory_offline(addr) < 0) { Lprintf("Offlining page %llx failed: %s\n", addr, strerror(errno)); mp->offlined = PAGE_OFFLINE_FAILED; } else mp->offlined = PAGE_OFFLINE; } I think return an negative value will be more proper? As the related killing function may not be performed, and we can't say it's a success operation? 2:mm/hwpoison-inject.c:51: return memory_failure(pfn, 0); mm/madvise.c:910: ret = memory_failure(pfn, MF_COUNT_INCREASED); These two cases are mainly for error injections, I checked the test codes, mostly it only care if the value is 0 or < 0; I do the related test, normally it work well, but for stress test, sometimes in some case, I do meet some fail cases along with the -EBUSY return. I will dig more. Other place will only care if the return value is 0. or just ignore it. Hi naoya, what's your opnion for this possible issue, I need your inputs! Thanks Aili Yao