> On Aug 12, 2019, at 6:06 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 08/09, Song Liu wrote: >> >> +void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) >> +{ >> + unsigned long haddr = addr & HPAGE_PMD_MASK; >> + struct vm_area_struct *vma = find_vma(mm, haddr); >> + struct page *hpage = NULL; >> + pmd_t *pmd, _pmd; >> + spinlock_t *ptl; >> + int count = 0; >> + int i; >> + >> + if (!vma || !vma->vm_file || >> + vma->vm_start > haddr || vma->vm_end < haddr + HPAGE_PMD_SIZE) >> + return; >> + >> + /* >> + * This vm_flags may not have VM_HUGEPAGE if the page was not >> + * collapsed by this mm. But we can still collapse if the page is >> + * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check() >> + * will not fail the vma for missing VM_HUGEPAGE >> + */ >> + if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) >> + return; >> + >> + pmd = mm_find_pmd(mm, haddr); >> + if (!pmd) >> + return; >> + >> + /* step 1: check all mapped PTEs are to the right huge page */ >> + for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) { >> + pte_t *pte = pte_offset_map(pmd, addr); >> + struct page *page; >> + >> + if (pte_none(*pte) || !pte_present(*pte)) >> + continue; > > if (!pte_present(*pte)) > return; > > you can't simply flush pmd if this page is swapped out. hmm... how about if (pte_none(*pte)) continue; if (!pte_present(*pte)) return; If the page hasn't faulted in for this mm, i.e. pte_none(), we can flush the pmd. > >> + >> + page = vm_normal_page(vma, addr, *pte); >> + >> + if (!page || !PageCompound(page)) >> + return; >> + >> + if (!hpage) { >> + hpage = compound_head(page); >> + /* >> + * The mapping of the THP should not change. >> + * >> + * Note that uprobe may change the page table, > > Not only uprobe can cow the page. Debugger can do. Or mmap(PROT_WRITE, MAP_PRIVATE). > > uprobe() is "special" because it a) it works with a foreign mm and b) > it can't stop the process which uses this mm. Otherwise it could simply > update the page returned by get_user_pages_remote(FOLL_FORCE), just we > would need to add FOLL_WRITE and if we do this we do not even need SPLIT, > that is why, say, __access_remote_vm() works without SPLIT. Will update the comment in next version. Thanks! Song