答复: consult a question about action_result() in memory_failure()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Naoya,

> Hi gengdongjiu,
> 
> On Tue, Oct 24, 2017 at 08:47:41PM +0800, gengdongjiu wrote:
> > Hi Naoya,
> >    very sorry to disturb you, I want to consult you about the handing to error page type in memory_failure().
> > If the error page is the current task's page table, will the memory_failure not handling that?
> > From my test, I found the memory_failure() consider the error page table physical address as unknown page.
> > why it does not handling the page table page error? Thanks a lot.
> 
> I think that that's because it's handled not in the context of memory error handling, but in MCE's context.
> 
> When your hardware detects a memory error on a page table page (f.e. memory scrubbing running in background), MCE SRAO is sent to the
> kernel, and the kernel kicks memory error handler.
> But memory error handler does nothing because there's currently no way to isolate the page table page. I think that a main problem is that
> no one easily knows "which processes owned the page table page."
> So the error page is still open for access, then later some CPU try to access the page table page, which triggers severer MCE SRAR.
> Then in this time, MCE handler tries to kill the process of current context (hoping that it's the right process to be killed.) # For errors on
> "kernel" page table pages, there's no choice other # than panic...

Thanks you very much for your reply. And sorry for my late response.
I basically understand your idea.
In the x86 platform, if the stage2 page table error happened in the guest OS user space process.
How to handle it?
For this case, it may be trapped out to host, as you descripted, host check the current process and 
found it is Qemu. So is it will kill Qemu? But in fact, killing guest user space process can be enough

> 
> So the current situation not the worst, but still open for improvement.
> Any suggestion to handle it in memory error handling would be wonderful.


> 
> Thanks,
> Naoya Horiguchi
> 
> 
> >
> > commit 64d37a2baf5e5c0f1009c0ef290a9027de721d66
> > Author: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> > Date:   Wed Apr 15 16:13:05 2015 -0700
> >
> >     mm/memory-failure.c: define page types for action_result() in one
> > place
> >
> >     This cleanup patch moves all strings passed to action_result() into a
> >     singl= e array action_page_type so that a reader can easily find which
> >     kind of actio= n results are possible.  And this patch also fixes the
> >     odd lines to be printed out, like "unknown page state page" or "free
> >     buddy, 2nd try page".
> >
> >     [akpm@xxxxxxxxxxxxxxxxxxxx: rename messages, per David]
> >     [akpm@xxxxxxxxxxxxxxxxxxxx: s/DIRTY_UNEVICTABLE_LRU/CLEAN_UNEVICTABLE_LRU', per Andi]
> >     Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> >     Reviewed-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> >     Cc: Tony Luck <tony.luck@xxxxxxxxx>
> >     Cc: "Xie XiuQi" <xiexiuqi@xxxxxxxxxx>
> >     Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> >     Cc: Chen Gong <gong.chen@xxxxxxxxxxxxxxx>
> >     Cc: David Rientjes <rientjes@xxxxxxxxxx>
> >     Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c index
> > d487f8d..5fd8931 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -521,6 +521,52 @@ static const char *action_name[] = {
> >         [RECOVERED] = "Recovered",
> >  };
> >
> > +enum action_page_type {
> > +       MSG_KERNEL,
> > +       MSG_KERNEL_HIGH_ORDER,
> > +       MSG_SLAB,
> > +       MSG_DIFFERENT_COMPOUND,
> > +       MSG_POISONED_HUGE,
> > +       MSG_HUGE,
> > +       MSG_FREE_HUGE,
> > +       MSG_UNMAP_FAILED,
> > +       MSG_DIRTY_SWAPCACHE,
> > +       MSG_CLEAN_SWAPCACHE,
> > +       MSG_DIRTY_MLOCKED_LRU,
> > +       MSG_CLEAN_MLOCKED_LRU,
> > +       MSG_DIRTY_UNEVICTABLE_LRU,
> > +       MSG_CLEAN_UNEVICTABLE_LRU,
> > +       MSG_DIRTY_LRU,
> > +       MSG_CLEAN_LRU,
> > +       MSG_TRUNCATED_LRU,
> > +       MSG_BUDDY,
> > +       MSG_BUDDY_2ND,
> > +       MSG_UNKNOWN,
> > +};
> > +
> > +static const char * const action_page_types[] = {
> > +       [MSG_KERNEL]                    = "reserved kernel page",
> > +       [MSG_KERNEL_HIGH_ORDER]         = "high-order kernel page",
> > +       [MSG_SLAB]                      = "kernel slab page",
> > +       [MSG_DIFFERENT_COMPOUND]        = "different compound page after locking",
> > +       [MSG_POISONED_HUGE]             = "huge page already hardware poisoned",
> > +       [MSG_HUGE]                      = "huge page",
> > +       [MSG_FREE_HUGE]                 = "free huge page",
> > +       [MSG_UNMAP_FAILED]              = "unmapping failed page",
> > +       [MSG_DIRTY_SWAPCACHE]           = "dirty swapcache page",
> > +       [MSG_CLEAN_SWAPCACHE]           = "clean swapcache page",
> > +       [MSG_DIRTY_MLOCKED_LRU]         = "dirty mlocked LRU page",
> > +       [MSG_CLEAN_MLOCKED_LRU]         = "clean mlocked LRU page",
> > +       [MSG_DIRTY_UNEVICTABLE_LRU]     = "dirty unevictable LRU page",
> > +       [MSG_CLEAN_UNEVICTABLE_LRU]     = "clean unevictable LRU page",
> > +       [MSG_DIRTY_LRU]                 = "dirty LRU page",
> > +       [MSG_CLEAN_LRU]                 = "clean LRU page",
> > +       [MSG_TRUNCATED_LRU]             = "already truncated LRU page",
> > +       [MSG_BUDDY]                     = "free buddy page",
> > +       [MSG_BUDDY_2ND]                 = "free buddy page (2nd try)",
> > +       [MSG_UNKNOWN]                   = "unknown page",
> > +};
> >
> >
?韬{.n???檩jg???a?旃???)钋???骅w+h?璀?y/i?⒏??⒎???Щ??m???)钋???痂?^??觥??ザ?v???O璁?f??i?⒏?



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux