On Thu, Jul 19, 2012 at 04:42:13PM +0200, Michal Hocko wrote: > [/me puts the patch destroyer glasses on] > It's a super power now. > On Wed 18-07-12 11:43:09, Mel Gorman wrote: > [...] > > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > > index f6679a7..0524556 100644 > > --- a/arch/x86/mm/hugetlbpage.c > > +++ b/arch/x86/mm/hugetlbpage.c > > @@ -68,14 +68,37 @@ static void huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) > > struct vm_area_struct *svma; > > unsigned long saddr; > > pte_t *spte = NULL; > > + spinlock_t *spage_table_lock = NULL; > > + struct rw_semaphore *smmap_sem = NULL; > > > > if (!vma_shareable(vma, addr)) > > return; > > > > +retry: > > mutex_lock(&mapping->i_mmap_mutex); > > vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) { > > if (svma == vma) > > continue; > > + if (svma->vm_mm == vma->vm_mm) > > + continue; > > + > > + /* > > + * The target mm could be in the process of tearing down > > + * its page tables and the i_mmap_mutex on its own is > > + * not sufficient. To prevent races against teardown and > > + * pagetable updates, we acquire the mmap_sem and pagetable > > + * lock of the remote address space. down_read_trylock() > > + * is necessary as the other process could also be trying > > + * to share pagetables with the current mm. > > + */ > > + if (!down_read_trylock(&svma->vm_mm->mmap_sem)) { > > + mutex_unlock(&mapping->i_mmap_mutex); > > + goto retry; > > + } > > + > > I am afraid this can easily cause a dead lock. Consider > fork > dup_mmap > down_write(&oldmm->mmap_sem) > copy_page_range > copy_hugetlb_page_range > huge_pte_alloc > > svma could belong to oldmm and then we would loop for ever. > svma->vm_mm == vma->vm_mm doesn't help because vma is child's one and mm > differ in that case. I am wondering you didn't hit this while testing. > It would suggest that the ptes are not populated yet because we didn't > let parent play and then other children could place its vma in the list > before parent? > Yes, I think you're right - both about the race and why I didn't hit it. The libhugetlbfs tests probably avoided the bug for the same reason. Thanks for pointing this out. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>