On Thu, Jun 13, 2019 at 03:03:01PM +0000, Song Liu wrote: > > > > On Jun 13, 2019, at 7:16 AM, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote: > > > > On Thu, Jun 13, 2019 at 01:57:30PM +0000, Song Liu wrote: > >>> And I'm not convinced that it belongs here at all. User requested PMD > >>> split and it is done after split_huge_pmd(). The rest can be handled by > >>> the caller as needed. > >> > >> I put this part here because split_huge_pmd() for file-backed THP is > >> not really done after split_huge_pmd(). And I would like it done before > >> calling follow_page_pte() below. Maybe we can still do them here, just > >> for file-backed THPs? > >> > >> If we would move it, shall we move to callers of follow_page_mask()? > >> In that case, we will probably end up with similar code in two places: > >> __get_user_pages() and follow_page(). > >> > >> Did I get this right? > > > > Would it be enough to replace pte_offset_map_lock() in follow_page_pte() > > with pte_alloc_map_lock()? > > This is similar to my previous version: > > + } else { /* flags & FOLL_SPLIT_PMD */ > + pte_t *pte; > + spin_unlock(ptl); > + split_huge_pmd(vma, pmd, address); > + pte = get_locked_pte(mm, address, &ptl); > + if (!pte) > + return no_page_table(vma, flags); > + spin_unlock(ptl); > + ret = 0; > + } > > I think this is cleaner than use pte_alloc_map_lock() in follow_page_pte(). > What's your thought on these two versions (^^^ vs. pte_alloc_map_lock)? It's additional lock-unlock cycle and few more lines of code... > > This will leave bunch not populated PTE entries, but it is fine: they will > > be populated on the next access to them. > > We need to handle page fault during next access, right? Since we already > allocated everything, we can just populate the PTE entries and saves a > lot of page faults (assuming we will access them later). Not a lot due to faultaround and they may never happen, but you need to tear down the mapping any way. -- Kirill A. Shutemov