(2012/04/04 10:56), David Rientjes wrote: > On COW, a new hugepage is allocated and charged to the memcg. If the > memcg is oom, however, this charge will fail and will return VM_FAULT_OOM > to the page fault handler which results in an oom kill. > > Instead, it's possible to fallback to splitting the hugepage so that the > COW results only in an order-0 page being charged to the memcg which has > a higher liklihood to succeed. This is expensive because the hugepage > must be split in the page fault handler, but it is much better than > unnecessarily oom killing a process. > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/huge_memory.c | 1 + > mm/memory.c | 18 +++++++++++++++--- > 2 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -959,6 +959,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > > if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) { > put_page(new_page); > + split_huge_page(page); > put_page(page); > ret |= VM_FAULT_OOM; > goto out; > diff --git a/mm/memory.c b/mm/memory.c > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3489,6 +3489,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > if (unlikely(is_vm_hugetlb_page(vma))) > return hugetlb_fault(mm, vma, address, flags); > > +retry: > pgd = pgd_offset(mm, address); > pud = pud_alloc(mm, pgd, address); > if (!pud) > @@ -3502,13 +3503,24 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > pmd, flags); > } else { > pmd_t orig_pmd = *pmd; > + int ret; > + > barrier(); > if (pmd_trans_huge(orig_pmd)) { > if (flags & FAULT_FLAG_WRITE && > !pmd_write(orig_pmd) && > - !pmd_trans_splitting(orig_pmd)) > - return do_huge_pmd_wp_page(mm, vma, address, > - pmd, orig_pmd); > + !pmd_trans_splitting(orig_pmd)) { > + ret = do_huge_pmd_wp_page(mm, vma, address, pmd, > + orig_pmd); > + /* > + * If COW results in an oom memcg, the huge pmd > + * will already have been split, so retry the > + * fault on the pte for a smaller charge. > + */ IIUC, do_huge_pmd_wp_page_fallback() can return VM_FAULT_OOM. So, this check is not related only to memcg. > + if (unlikely(ret & VM_FAULT_OOM)) > + goto retry; > + return ret; > + } > return 0; Anyway, seems reasonable to me. Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>