On Tue, Jun 21, 2016 at 06:04:33PM +0300, Kirill A. Shutemov wrote: > On Mon, Jun 20, 2016 at 12:32:01PM +0300, Kirill A. Shutemov wrote: > > > +void split_huge_pmd_address_freeze(struct vm_area_struct *vma, > > > + unsigned long address, struct page *page) > > > +{ > > > + pgd_t *pgd; > > > + pud_t *pud; > > > + pmd_t *pmd; > > > + > > > + pgd = pgd_offset(vma->vm_mm, address); > > > + if (!pgd_present(*pgd)) > > > + return; > > > + > > > + pud = pud_offset(pgd, address); > > > + if (!pud_present(*pud)) > > > + return; > > > + > > > + pmd = pmd_offset(pud, address); > > > + __split_huge_pmd(vma, pmd, address, page, true); > > > } > > > > I don't see a reason to introduce new function. Just move the page > > check under ptl from split_huge_pmd_address() and that should be enough. Sorry for my slow response (I was offline yesterday.) My point of separating function is to avoid checking pmd_present outside ptl just for freeze=true case (I didn't want affect other path, i.e. from vma_adjust_trans_huge().) But I think that the new function is unnecessary if we move the following part of split_huge_pmd_address() into ptl, if (!pmd_present(*pmd) || (!pmd_trans_huge(*pmd) && !pmd_devmap(*pmd))) return; Does it make sense? > > Or am I missing something? > > I'm talking about something like patch below. Could you test it? Thanks, with this patch my 3-hour testing doesn't trigger the problem, so it works. But I feel it's weird because I think that the source of the race is "if (!pmd_present)" check in split_huge_pmd_address() called outside ptl. Your patch doesn't change that part, so I'm not sure why this fix works. > If it works fine to you, feel free to submit with my Signed-off-by. > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index eb810816bbc6..92ce91c03cd0 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -98,7 +98,7 @@ static inline int split_huge_page(struct page *page) > void deferred_split_huge_page(struct page *page); > > void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > - unsigned long address, bool freeze); > + unsigned long address, bool freeze, struct page *page); > > #define split_huge_pmd(__vma, __pmd, __address) \ > do { \ > @@ -106,7 +106,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > if (pmd_trans_huge(*____pmd) \ > || pmd_devmap(*____pmd)) \ > __split_huge_pmd(__vma, __pmd, __address, \ > - false); \ > + false, NULL); \ > } while (0) > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index eaf3a4a655a6..2297aa41581e 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1638,7 +1638,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > } > > void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > - unsigned long address, bool freeze) > + unsigned long address, bool freeze, struct page *page) > { > spinlock_t *ptl; > struct mm_struct *mm = vma->vm_mm; > @@ -1646,8 +1646,17 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > > mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE); > ptl = pmd_lock(mm, pmd); > + > + /* > + * If caller asks to setup a migration entries, we need a page to check > + * pmd against. Otherwise we can end up replacing wrong page. > + */ > + VM_BUG_ON(freeze && !page); > + if (page && page != pmd_page(*pmd)) > + goto out; > + > if (pmd_trans_huge(*pmd)) { > - struct page *page = pmd_page(*pmd); > + page = pmd_page(*pmd); > if (PageMlocked(page)) > clear_page_mlock(page); > } else if (!pmd_devmap(*pmd)) > @@ -1678,20 +1687,12 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, > return; > > /* > - * If caller asks to setup a migration entries, we need a page to check > - * pmd against. Otherwise we can end up replacing wrong page. > - */ > - VM_BUG_ON(freeze && !page); > - if (page && page != pmd_page(*pmd)) > - return; > - > - /* > * Caller holds the mmap_sem write mode or the anon_vma lock, > * so a huge pmd cannot materialize from under us (khugepaged > * holds both the mmap_sem write mode and the anon_vma lock > * write mode). Not a big issue, but this comment about mmap_sem seems relevant only when called from vma_adjust_trans_huge(), so might need some update? > */ > - __split_huge_pmd(vma, pmd, address, freeze); > + __split_huge_pmd(vma, pmd, address, freeze, page); > } > > void vma_adjust_trans_huge(struct vm_area_struct *vma, > -- > Kirill A. Shutemov > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href