Hi Kirill, > On Jun 13, 2019, at 8:24 AM, Song Liu <songliubraving@xxxxxx> wrote: > > > >> On Jun 13, 2019, at 8:14 AM, Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: >> >> On Thu, Jun 13, 2019 at 03:03:01PM +0000, Song Liu wrote: >>> >>> >>>> On Jun 13, 2019, at 7:16 AM, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: >>>> >>>> On Thu, Jun 13, 2019 at 01:57:30PM +0000, Song Liu wrote: >>>>>> And I'm not convinced that it belongs here at all. User requested PMD >>>>>> split and it is done after split_huge_pmd(). The rest can be handled by >>>>>> the caller as needed. >>>>> >>>>> I put this part here because split_huge_pmd() for file-backed THP is >>>>> not really done after split_huge_pmd(). And I would like it done before >>>>> calling follow_page_pte() below. Maybe we can still do them here, just >>>>> for file-backed THPs? >>>>> >>>>> If we would move it, shall we move to callers of follow_page_mask()? >>>>> In that case, we will probably end up with similar code in two places: >>>>> __get_user_pages() and follow_page(). >>>>> >>>>> Did I get this right? >>>> >>>> Would it be enough to replace pte_offset_map_lock() in follow_page_pte() >>>> with pte_alloc_map_lock()? >>> >>> This is similar to my previous version: >>> >>> + } else { /* flags & FOLL_SPLIT_PMD */ >>> + pte_t *pte; >>> + spin_unlock(ptl); >>> + split_huge_pmd(vma, pmd, address); >>> + pte = get_locked_pte(mm, address, &ptl); >>> + if (!pte) >>> + return no_page_table(vma, flags); >>> + spin_unlock(ptl); >>> + ret = 0; >>> + } >>> >>> I think this is cleaner than use pte_alloc_map_lock() in follow_page_pte(). >>> What's your thought on these two versions (^^^ vs. pte_alloc_map_lock)? >> >> It's additional lock-unlock cycle and few more lines of code... >> >>>> This will leave bunch not populated PTE entries, but it is fine: they will >>>> be populated on the next access to them. >>> >>> We need to handle page fault during next access, right? Since we already >>> allocated everything, we can just populate the PTE entries and saves a >>> lot of page faults (assuming we will access them later). >> >> Not a lot due to faultaround and they may never happen, but you need to >> tear down the mapping any way. > > I see. Let me try this way. > > Thanks, > Song To make sure I understand your suggestions. Here is what I got: diff --git c/mm/gup.c w/mm/gup.c index ddde097cf9e4..85e6f46fd925 100644 --- c/mm/gup.c +++ w/mm/gup.c @@ -197,7 +197,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); - ptep = pte_offset_map_lock(mm, pmd, address, &ptl); + ptep = pte_alloc_map_lock(mm, pmd, address, &ptl); + if (!ptep) + return ERR_PTR(-ENOMEM); + pte = *ptep; if (!pte_present(pte)) { swp_entry_t entry; @@ -398,7 +401,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, spin_unlock(ptl); return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); } - if (flags & FOLL_SPLIT) { + if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) { int ret; page = pmd_page(*pmd); if (is_huge_zero_page(page)) { @@ -407,7 +410,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, split_huge_pmd(vma, pmd, address); if (pmd_trans_unstable(pmd)) ret = -EBUSY; - } else { + } else if (flags & FOLL_SPLIT) { if (unlikely(!try_get_page(page))) { spin_unlock(ptl); return ERR_PTR(-ENOMEM); @@ -419,6 +422,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, put_page(page); if (pmd_none(*pmd)) return no_page_table(vma, flags); + } else { /* flags & FOLL_SPLIT_PMD */ + spin_unlock(ptl); + split_huge_pmd(vma, pmd, address); + ret = 0; } return ret ? ERR_PTR(ret) : follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); This version doesn't work as-is, because it returns at the first check: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); Did I misunderstand anything here? Thanks, Song > >> >> -- >> Kirill A. Shutemov