Re: [patch v2] thp, memcg: split hugepage for memcg oom on cow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 26, 2012 at 02:05:11PM -0700, David Rientjes wrote:
> On Thu, 26 Apr 2012, Johannes Weiner wrote:
> 
> > > I agree it's more robust if do_huge_pmd_wp_page() were modified later and 
> > > mistakenly returned VM_FAULT_OOM without the page being split, but 
> > > __split_huge_page_pmd() has the drawback of also requiring to retake 
> > > mm->page_table_lock to test whether orig_pmd is still legitimate so it 
> > > will be slower.  Do you feel strongly about the way it's currently written 
> > > which will be faster at runtime?
> > 
> > If you can't accomodate for a hugepage, this code runs 511 times in
> > the worst case before you also can't fit a regular page anymore.  And
> > compare it to the cost of the splitting itself and the subsequent 4k
> > COW break faults...
> > 
> > I don't think it's a path worth optimizing for at all, especially if
> > it includes sprinkling undocumented split_huge_pages around, and the
> > fix could be as self-contained as something like this...
> > 
> 
> I disagree that we should be unnecessarily taking mm->page_table_lock 
> which is already strongly contended if all cpus are pagefaulting on the 
> same process (and I'll be posting a patch to address specifically those 
> slowdowns since thp is _much_ slower on page fault tests) when we can 
> already do it in do_huge_pmd_wp_page().  If you'd like to add a comment 
> for the split_huge_page() in that function if it's not clear enough from 
> my VM_FAULT_OOM comment in handle_mm_fault(), then feel free to add it but 
> I thought it was rather trivial to understand.

Come on, it's not "trivial to understand" why the page in the parent
is split because the child failed to allocate a replacement, shortly
before returning "out of memory".  You have to look at a different
file to make sense of it.  Such cross-dependencies between functions
simply suck and made problems in the past.  The least you could do is
properly document them in _both_ places if you insist on adding them
in the first place.

Btw, is restarting the full page table walk even necessary?  You
already have the pmd, know/hope it's been split, and hold the mmap_sem
for reading, so it can't go back to being huge or none.  You should be
able to fall through to the pte lookup and handle_pte_fault(), no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]