On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: > > On 12/15/2017 06:33 PM, Michal Hocko wrote: > > Naoya, > > this has passed Mike's review (thanks for that!), you have mentioned > > that you can pass this through your testing machinery earlier. While > > I've done some testing already I would really appreciate if you could > > do that as well. Review would be highly appreciated as well. > > Sorry for my slow response. I reviewed/tested this patchset and looks > good to me overall. No need to feel sorry. This doesn't have an urgent priority. Thanks for the review and testing. Can I assume your {Reviewed,Acked}-by or Tested-by? > I have one comment on the code path from mbind(2). > The callback passed to migrate_pages() in do_mbind() (i.e. new_page()) > calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(), > so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages. Yes, I am aware of that. I should have been more explicit in the changelog. Sorry about that and thanks for pointing it out explicitly. To be honest I wasn't really sure what to do about this. The code path is really complex and it made my head spin. I fail to see why we have to call alloc_huge_page and mess with reservations at all. > I don't think this is a bug, but it would be better if mbind(2) works > more similarly with other migration callers like move_pages(2)/migrate_pages(2). If the fix is as easy as the following I will add it to the pile. Otherwise I would prefer to do this separately after I find some more time to understand the callpath. --- diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e035002d3fb6..08a4af411e25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -345,10 +345,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_node(struct hstate *h, int nid); -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask); +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -526,7 +525,7 @@ struct hstate {}; #define alloc_huge_page(v, a, r) NULL #define alloc_huge_page_node(h, nid) NULL #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL -#define alloc_huge_page_noerr(v, a, r) NULL +#define alloc_huge_page_vma(vma, address) NULL #define alloc_bootmem_huge_page(h) NULL #define hstate_file(f) NULL #define hstate_sizelog(s) NULL diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4426c5b23a20..e00deabe6d17 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } +/* mempolicy aware migration callback */ +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct page *page; + struct hstate *h; + gfp_t gfp_mask; + int node; + + h = hstate_vma(vma); + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); + page = alloc_huge_page_nodemask(h, node, nodemask); + mpol_cond_put(mpol); + + return page; +} + /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } -/* - * alloc_huge_page()'s wrapper which simply returns the page if allocation - * succeeds, otherwise NULL. This function is called from new_vma_page(), - * where no ERR_VALUE is expected to be returned. - */ -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) -{ - struct page *page = alloc_huge_page(vma, addr, avoid_reserve); - if (IS_ERR(page)) - page = NULL; - return page; -} - int alloc_bootmem_huge_page(struct hstate *h) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f604b22ebb65..96823fa07f38 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1121,8 +1121,7 @@ static struct page *new_page(struct page *page, unsigned long start, int **x) } if (PageHuge(page)) { - BUG_ON(!vma); - return alloc_huge_page_noerr(vma, address, 1); + return alloc_huge_page_vma(vma, address); } else if (thp_migration_supported() && PageTransHuge(page)) { struct page *thp; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>