Cc Mike, David, who is an expert of hugetlb and thp On Fri, Jun 14, 2019 at 5:37 AM Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > On Thu, Jun 13, 2019 at 06:45:01PM +0800, Pingfan Liu wrote: > > FOLL_LONGTERM suggests a pin which is going to be given to hardware and > > can't move. It would truncate CMA permanently and should be excluded. > > > > FOLL_LONGTERM has already been checked in the slow path, but not checked in > > the fast path, which means a possible leak of CMA page to longterm pinned > > requirement through this crack. > > > > Place a check in gup_pte_range() in the fast path. > > > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Cc: John Hubbard <jhubbard@xxxxxxxxxx> > > Cc: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> > > Cc: Keith Busch <keith.busch@xxxxxxxxx> > > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > > Cc: Shuah Khan <shuah@xxxxxxxxxx> > > Cc: linux-kernel@xxxxxxxxxxxxxxx > > --- > > mm/gup.c | 26 ++++++++++++++++++++++++++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 766ae54..de1b03f 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > VM_BUG_ON(!pfn_valid(pte_pfn(pte))); > > page = pte_page(pte); > > > > + /* > > + * FOLL_LONGTERM suggests a pin given to hardware. Prevent it > > + * from truncating CMA area > > + */ > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) > > + goto pte_unmap; > > + > > head = try_get_compound_head(page, 1); > > if (!head) > > goto pte_unmap; > > @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > Why can't we place this check before the while loop and skip subtracting the > page count? Yes, that will be better. > > Can is_migrate_cma_page() operate on any "subpage" of a compound page? For gigantic page, __alloc_gigantic_page() allocate from MIGRATE_MOVABLE pageblock. For page order < MAX_ORDER, pages are allocated from either free_list[MIGRATE_MOVABLE] or free_list[MIGRATE_CMA]. So all subpage have the same migrate type. Thanks, Pingfan > > Here this calls is_magrate_cma_page() on the tail page of the compound page. > > I'm not an expert on compound pages nor cma handling so is this ok? > > It seems like you need to call is_migrate_cma_page() on each page within the > while loop? > > > head = try_get_compound_head(pmd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > Same comment here. > > > head = try_get_compound_head(pud_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > And here. > > Ira > > > head = try_get_compound_head(pgd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > -- > > 2.7.5 > >