On Tue, Jul 23, 2013 at 04:01:20PM +0200, Michal Hocko wrote: > On Fri 19-07-13 09:13:03, Minchan Kim wrote: > > On Thu, Jul 18, 2013 at 11:12:24PM +0530, Aneesh Kumar K.V wrote: > [...] > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 83aff0a..2cb1be3 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -3266,8 +3266,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) > > > put_page(virt_to_page(spte)); > > > spin_unlock(&mm->page_table_lock); > > > out: > > > - pte = (pte_t *)pmd_alloc(mm, pud, addr); > > > mutex_unlock(&mapping->i_mmap_mutex); > > > + pte = (pte_t *)pmd_alloc(mm, pud, addr); > > > return pte; > > > > I am blind on hugetlb but not sure it doesn't break eb48c071. > > Michal? > > Well, it is some time since I debugged the race and all the details > vanished in the meantime. But this part of the changelog suggests that > this indeed breaks the fix: > " > This patch addresses the issue by moving pmd_alloc into huge_pmd_share > which guarantees that the shared pud is populated in the same critical > section as pmd. This also means that huge_pte_offset test in > huge_pmd_share is serialized correctly now which in turn means that the > success of the sharing will be higher as the racing tasks see the pud > and pmd populated together. > " > > Besides that I fail to see how moving pmd_alloc down changes anything. > Even if pmd_alloc triggered reclaim then we cannot trip over the same > i_mmap_mutex as hugetlb pages are not reclaimable because they are not > on the LRU. I thought we could map some part of binary with normal page and other part of the one with MAP_HUGETLB so that a address space could have both normal page and HugeTLB page. Okay, it's impossible so HugeTLB pages are not on LRU. Then, above lockdep warning is totally false positive. Best solution is avoiding pmd_alloc with holding i_mmap_mutex but we need it to fix eb48c071 so how about this if we couldn't have a better idea? diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 83aff0a..e7c3a15 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3240,7 +3240,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) if (!vma_shareable(vma, addr)) return (pte_t *)pmd_alloc(mm, pud, addr); - mutex_lock(&mapping->i_mmap_mutex); + /* + * It annotates to shut lockdep's warning up casued by i_mmap_mutex + * Below pmd_alloc try to allocate memory with GFP_KERNEL while + * holding i_mmap_mutex so that it could enter direct reclaim path + * that rmap try to hold i_mmap_mutex again. But it's no problem + * for hugetlb because pages on hugetlb never could live in LRU so + * it's false positive. I hope someone fixes it with avoiding pmd_alloc + * with holding i_mmap_mutex rather than nesting annotation. + */ + mutex_lock_nested(&mapping->i_mmap_mutex, SINGLE_DEPTH_NESTING); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>