On Tue, Sep 10, 2013 at 03:41:09PM +0100, Mel Gorman wrote: > On Fri, Aug 09, 2013 at 01:21:38AM -0400, Naoya Horiguchi wrote: > > This patch extends do_mbind() to handle vma with VM_HUGETLB set. > > We will be able to migrate hugepage with mbind(2) after > > applying the enablement patch which comes later in this series. > > > > ChangeLog v3: > > - revert introducing migrate_movable_pages > > - added alloc_huge_page_noerr free from ERR_VALUE > > > > ChangeLog v2: > > - updated description and renamed patch title > > > > Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > > Acked-by: Andi Kleen <ak@xxxxxxxxxxxxxxx> > > Reviewed-by: Wanpeng Li <liwanp@xxxxxxxxxxxxxxxxxx> > > Acked-by: Hillf Danton <dhillf@xxxxxxxxx> > > --- > > include/linux/hugetlb.h | 3 +++ > > mm/hugetlb.c | 14 ++++++++++++++ > > mm/mempolicy.c | 4 +++- > > 3 files changed, 20 insertions(+), 1 deletion(-) > > > > diff --git v3.11-rc3.orig/include/linux/hugetlb.h v3.11-rc3/include/linux/hugetlb.h > > index bc8d837..d1db007 100644 > > --- v3.11-rc3.orig/include/linux/hugetlb.h > > +++ v3.11-rc3/include/linux/hugetlb.h > > @@ -265,6 +265,8 @@ struct huge_bootmem_page { > > }; > > > > struct page *alloc_huge_page_node(struct hstate *h, int nid); > > +struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > > + unsigned long addr, int avoid_reserve); > > > > /* arch callback */ > > int __init alloc_bootmem_huge_page(struct hstate *h); > > @@ -378,6 +380,7 @@ static inline pgoff_t basepage_index(struct page *page) > > #else /* CONFIG_HUGETLB_PAGE */ > > struct hstate {}; > > #define alloc_huge_page_node(h, nid) NULL > > +#define alloc_huge_page_noerr(v, a, r) NULL > > #define alloc_bootmem_huge_page(h) NULL > > #define hstate_file(f) NULL > > #define hstate_sizelog(s) NULL > > diff --git v3.11-rc3.orig/mm/hugetlb.c v3.11-rc3/mm/hugetlb.c > > index 649771c..ee764b0 100644 > > --- v3.11-rc3.orig/mm/hugetlb.c > > +++ v3.11-rc3/mm/hugetlb.c > > @@ -1195,6 +1195,20 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, > > return page; > > } > > > > +/* > > + * alloc_huge_page()'s wrapper which simply returns the page if allocation > > + * succeeds, otherwise NULL. This function is called from new_vma_page(), > > + * where no ERR_VALUE is expected to be returned. > > + */ > > +struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, > > + unsigned long addr, int avoid_reserve) > > +{ > > + struct page *page = alloc_huge_page(vma, addr, avoid_reserve); > > + if (IS_ERR(page)) > > + page = NULL; > > + return page; > > +} > > + > > int __weak alloc_bootmem_huge_page(struct hstate *h) > > { > > struct huge_bootmem_page *m; > > diff --git v3.11-rc3.orig/mm/mempolicy.c v3.11-rc3/mm/mempolicy.c > > index d96afc1..4a03c14 100644 > > --- v3.11-rc3.orig/mm/mempolicy.c > > +++ v3.11-rc3/mm/mempolicy.c > > @@ -1183,6 +1183,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * > > vma = vma->vm_next; > > } > > > > + if (PageHuge(page)) > > + return alloc_huge_page_noerr(vma, address, 1); > > /* > > * if !vma, alloc_page_vma() will use task or system default policy > > */ > > It's interesting to note that it will be tricky to configure a system to > allow this sort of migration to succeed. > > This call correctly uses avoid_reserve but that does mean that for it > to work that there there must be free pages statically allocated in the > hugepage pool of the destination node or hugepage dynamic pool resizing > must be enabled. The former option is going to waste memory because pages > allocated to the static pool cannot be used for any other purpose and > using dynamic hugepage pool resizing may fail. Yes, that's interesting because it's important to make page migration more likely to succeed. I guess that dynamic pool resizing can affect the pool configuration without administrators' knowing, so allocating surplus hugepages directly from buddy seems more preferable. > It makes me wonder how actually useful generic hugetlbfs page migration > will be in practice. Are there really usecases where the system > administrator is willing to create unused hugepage pools on each node > just to enable migration? Maybe most users don't want it. Thanks, Naoya Horiguchi > > @@ -1293,7 +1295,7 @@ static long do_mbind(unsigned long start, unsigned long len, > > (unsigned long)vma, > > MIGRATE_SYNC, MR_MEMPOLICY_MBIND); > > if (nr_failed) > > - putback_lru_pages(&pagelist); > > + putback_movable_pages(&pagelist); > > } > > > > if (nr_failed && (flags & MPOL_MF_STRICT)) > > -- > > 1.8.3.1 > > > > -- > Mel Gorman > SUSE Labs > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>