On Thu, Oct 04, 2012 at 01:51:04AM +0200, Andrea Arcangeli wrote: > If any of the ptes that khugepaged is collapsing was a pte_numa, the > resulting trans huge pmd will be a pmd_numa too. > > See the comment inline for why we require just one pte_numa pte to > make a pmd_numa pmd. If needed later we could change the number of > pte_numa ptes required to create a pmd_numa and make it tunable with > sysfs too. > It does increase the number of NUMA hinting faults that are incurred though, potentially offsetting the gains from using THP. Is this something that would just go away when THP pages are natively migrated by autonuma? Does it make a measurable improvement now? > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> > --- > mm/huge_memory.c | 33 +++++++++++++++++++++++++++++++-- > 1 files changed, 31 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 152d4dd..1023e67 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1833,12 +1833,19 @@ out: > return isolated; > } > > -static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > +/* > + * Do the actual data copy for mapped ptes and release the mapped > + * pages, or alternatively zero out the transparent hugepage in the > + * mapping holes. Transfer the page_autonuma information in the > + * process. Return true if any of the mapped ptes was of numa type. > + */ > +static bool __collapse_huge_page_copy(pte_t *pte, struct page *page, > struct vm_area_struct *vma, > unsigned long address, > spinlock_t *ptl) > { > pte_t *_pte; > + bool mknuma = false; > for (_pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++) { > pte_t pteval = *_pte; > struct page *src_page; > @@ -1865,11 +1872,29 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, > page_remove_rmap(src_page); > spin_unlock(ptl); > free_page_and_swap_cache(src_page); > + > + /* > + * Only require one pte_numa mapped by a pmd > + * to make it a pmd_numa, too. To avoid the > + * risk of losing NUMA hinting page faults, it > + * is better to overestimate the NUMA node > + * affinity with a node where we just > + * collapsed a hugepage, rather than > + * underestimate it. > + * > + * Note: if AUTONUMA_SCAN_PMD_FLAG is set, we > + * won't find any pte_numa ptes since we're > + * only setting NUMA hinting at the pmd > + * level. > + */ > + mknuma |= pte_numa(pteval); > } > > address += PAGE_SIZE; > page++; > } > + > + return mknuma; > } > > static void collapse_huge_page(struct mm_struct *mm, > @@ -1887,6 +1912,7 @@ static void collapse_huge_page(struct mm_struct *mm, > spinlock_t *ptl; > int isolated; > unsigned long hstart, hend; > + bool mknuma = false; > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > #ifndef CONFIG_NUMA > @@ -2005,7 +2031,8 @@ static void collapse_huge_page(struct mm_struct *mm, > */ > anon_vma_unlock(vma->anon_vma); > > - __collapse_huge_page_copy(pte, new_page, vma, address, ptl); > + mknuma = pmd_numa(_pmd); > + mknuma |= __collapse_huge_page_copy(pte, new_page, vma, address, ptl); > pte_unmap(pte); > __SetPageUptodate(new_page); > pgtable = pmd_pgtable(_pmd); > @@ -2015,6 +2042,8 @@ static void collapse_huge_page(struct mm_struct *mm, > _pmd = mk_pmd(new_page, vma->vm_page_prot); > _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); > _pmd = pmd_mkhuge(_pmd); > + if (mknuma) > + _pmd = pmd_mknuma(_pmd); > > /* > * spin_lock() below is not the equivalent of smp_wmb(), so > -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>