On Mon 23-07-12 18:08:05, Hugh Dickins wrote: > On Mon, 23 Jul 2012, Mel Gorman wrote: > > On Sun, Jul 22, 2012 at 09:04:33PM -0700, Hugh Dickins wrote: > > > On Fri, 20 Jul 2012, Mel Gorman wrote: > > > > On Fri, Jul 20, 2012 at 04:36:35PM +0200, Michal Hocko wrote: > > > > I like it in that it's simple and I can confirm it works for the test case > > of interest. > > Phew, I'm glad to hear that, thanks. > > > > > However, is your patch not vunerable to truncate issues? > > madvise()/truncate() issues was the main reason why I was wary of VMA tricks > > as a solution. As it turns out, madvise(DONTNEED) is not a problem as it is > > ignored for hugetlbfs but I think truncate is still problematic. Lets say > > we mmap(MAP_SHARED) a hugetlbfs file and then truncate for whatever reason. > > > > invalidate_inode_pages2 > > invalidate_inode_pages2_range > > unmap_mapping_range_vma > > zap_page_range_single > > unmap_single_vma > > __unmap_hugepage_range (removes VM_MAYSHARE) > > > > The VMA still exists so the consequences for this would be varied but > > minimally fault is going to be "interesting". > > You had me worried there, I hadn't considered truncation or invalidation2 > at all. > > But actually, I don't think they do pose any problem for my patch. They > would indeed if I were removing VM_MAYSHARE in __unmap_hugepage_range() > as you show above; but no, I'm removing it in unmap_hugepage_range(). > > That's only called by unmap_single_vma(): which is called via > unmap_vmas() by unmap_region() or exit_mmap() just before free_pgtables() > (the problem cases); or by madvise_dontneed() via zap_page_range(), which > as you note is disallowed on VM_HUGETLB; or by zap_page_range_single(). > > zap_page_range_single() is called by zap_vma_ptes(), which is only > allowed on VM_PFNMAP; or by unmap_mapping_range_vma(), which looked > like it was going to deadlock on i_mmap_mutex (with or without my > patch) until I realized that hugetlbfs has its own hugetlbfs_setattr() > and hugetlb_vmtruncate() which don't use unmap_mapping_range() at all. > > invalidate_inode_pages2() (and _range()) do use unmap_mapping_range(), > but hugetlbfs doesn't support direct_IO, and otherwise I think they're > called by a filesystem directly on its own inodes, which hugetlbfs > does not. Good point, I didn't get this while looking into the code so I introduce the `last' parameter which told me that I am in the correct path. Thanks for clarification. > Anyway, if there's a deadlock on i_mmap_mutex somewhere in there, it's > not introduced by the proposed patch. > So, unmap_hugepage_range() is only being called in the problem cases, > just before free_pgtables(), when unmapping a vma (with mmap_sem held), > or when exiting (when we have the last reference to mm): in each case, > the vma is on its way out, and VM_MAYSHARE no longer of interest to others. > > I spent a while concerned that I'd overlooked the truncation case, before > realizing that it's not a problem: the issue comes when we free_pgtables(), > which truncation makes no attempt to do. > > So, after a bout of anxiety, I think my &= ~VM_MAYSHARE remains good. Yes, this is convincing (and subtle ;)) and much less polluting. You can add my Reviewed-by (with the above reasoning in the patch description) Anyway, the patch for mmotm needs an update because there was a reorganization in the area. First, we need to revert "hugetlb: avoid taking i_mmap_mutex in unmap_single_vma() for hugetlb)" (80f408f5 in memcg-devel) and then push your code into unmap_single_vma. All the above is still valid AFAICS. > > Hugh Thanks a lot Hugh! -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>