(Add Cc: Andi and Fengguang) On Thu, May 13, 2010 at 03:27:50PM +0100, Mel Gorman wrote: > On Thu, May 13, 2010 at 04:55:19PM +0900, Naoya Horiguchi wrote: > > This patchset enables error handling for hugepage by containing error > > in the affected hugepage. > > > > Until now, memory error (classified as SRAO in MCA language) on hugepage > > What does SRAO stand for? It doesn't matter, I'm just curious. SRAO stands for "Software Recoverable Action Optional." SRAO errors can be contained by software and then become harmless. > > was simply ignored, which means if someone accesses the error page later, > > the second MCE (severer than the first one) occurs and the system panics. > > > > It's useful for some aggressive hugepage users if only affected processes > > are killed. Then other unrelated processes aren't disturbed by the error > > and can continue operation. > > > > Surely, it's useful for any user of huge pages? Yes. > > Moreover, for other extensive hugetlb users which have own "pagecache" > > on hugepage, the most valued feature would be being able to receive > > the early kill signal BUS_MCEERR_AO, because the cache pages have > > good opportunity to be dropped without side effects on BUS_MCEERR_AO. > > > > Be careful here. The page cache that hugetlb uses is for MAP_SHARED > mappings. If the pages are discarded, they are gone and the result is data > loss. I think what you are suggesting is that a kill signal can instead be > translated into a harmless loss of page cache. That works for normal files > but not hugetlb. "Pagecache" I meant here is not the page cache in Linux kernel, but a cache managed by an application, e.g. the application reads/writes the cache contents with direct I/O and manages clean/dirty status itself. If HWPOISON-aware application catches signal BUS_MCEERR_AO, it can discard hugepage used as a cache and can reread from the file. Thanks, Naoya Horiguchi > > The design of hugepage error handling is based on that of non-hugepage > > error handling, where we: > > 1. mark the error page as hwpoison, > > 2. unmap the hwpoisoned page from processes using it, > > 3. invalidate error page, and > > 4. block later accesses to the hwpoisoned pages. > > > > Similarities and differences between huge and non-huge case are > > summarized below: > > > > 1. (Difference) when error occurs on a hugepage, PG_hwpoison bits on all pages > > in the hugepage are set, because we have no simple way to break up > > hugepage into individual pages for now. This means there is a some > > risk to be killed by touching non-guilty pages within the error hugepage. > > > > You're right in that you cannot easily demote a hugepage. It is possible but > I cannot see the value of the required effort. If there is an error within > the hugepage and touching another part of it results in the process being > unnecessarily killed, then so be it. > > > 2. (Similarity) hugetlb entry for the error hugepage is replaced by hwpoison > > swap entry, with which we can detect hwpoisoned memory in VM code. > > This is accomplished by adding rmapping code for hugepage, which enables > > to use try_to_unmap() for hugepage. > > > > This will be interesting. hugetlbfs pages could look like a file or anon > depending on whether it has been mapped shared or private. It's odd. > > > 3. (Difference) since hugepage is not linked to LRU list and is unswappable, > > there are not many things to do for page invalidation (only dequeuing > > free/reserved hugepage from freelist. See patch 5/7.) > > If we want to contain the error into one page, there may be more to do. > > > > 4. (Similarity) we block later accesses by forcing page requests for > > hwpoisoned hugepage to fail as done in non-hugepage case in do_wp_page(). > > > > ToDo: > > - Narrow down the containment region into one raw page. > > - Soft-offlining for hugepage is not supported due to the lack of migration > > for hugepage. > > - Counting file-mapped/anonymous hugepage in NR_FILE_MAPPED/NR_ANON_PAGES. > > > > [PATCH 1/7] hugetlb, rmap: add reverse mapping for hugepage > > [PATCH 2/7] HWPOISON, hugetlb: enable error handling path for hugepage > > [PATCH 3/7] HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage > > [PATCH 4/7] HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error > > [PATCH 5/7] HWPOISON, hugetlb: isolate corrupted hugepage > > [PATCH 6/7] HWPOISON, hugetlb: detect hwpoison in hugetlb code > > [PATCH 7/7] HWPOISON, hugetlb: support hwpoison injection for hugepage > > > > Dependency: > > - patch 2 depends on patch 1. > > - patch 3 to patch 6 depend on patch 2. > > > > include/linux/hugetlb.h | 3 + > > mm/hugetlb.c | 98 ++++++++++++++++++++++++++++++++++++++- > > mm/hwpoison-inject.c | 15 ++++-- > > mm/memory-failure.c | 120 +++++++++++++++++++++++++++++++++++------------ > > mm/rmap.c | 16 ++++++ > > 5 files changed, 215 insertions(+), 37 deletions(-) > > > > ChangeLog from v4: > > - rebased to 2.6.34-rc7 > > - add isolation code for free/reserved hugepage in me_huge_page() > > - set/clear PG_hwpoison bits of all pages in hugepage. > > - mce_bad_pages counts all pages in hugepage. > > - rename __hugepage_set_anon_rmap() to hugepage_add_anon_rmap() > > - add huge_pte_offset() dummy function in header file on !CONFIG_HUGETLBFS > > > > ChangeLog from v3: > > - rebased to 2.6.34-rc5 > > - support for privately mapped hugepage > > > > ChangeLog from v2: > > - rebase to 2.6.34-rc3 > > - consider mapcount of hugepage > > - rename pointer "head" into "hpage" > > > > ChangeLog from v1: > > - rebase to 2.6.34-rc1 > > - add comment from Wu Fengguang > > > > Thanks, > > Naoya Horiguchi > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@xxxxxxxxxx For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > > > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>