On Nov 7, 2022, at 5:19 PM, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page > tables associated with the address range. For hugetlb vmas, > zap_page_range will call __unmap_hugepage_range_final. However, > __unmap_hugepage_range_final assumes the passed vma is about to be removed > and deletes the vma_lock to prevent pmd sharing as the vma is on the way > out. In the case of madvise(MADV_DONTNEED) the vma remains, but the > missing vma_lock prevents pmd sharing and could potentially lead to issues > with truncation/fault races. I understand the problem in general. Please consider my feedback as partial though. > @@ -5203,32 +5194,50 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, > unsigned long end, struct page *ref_page, > zap_flags_t zap_flags) > { > + bool final = zap_flags & ZAP_FLAG_UNMAP; > + Not sure why caching final in local variable helps. > > void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > unsigned long end, struct page *ref_page, > zap_flags_t zap_flags) > { > + struct mmu_notifier_range range; > struct mmu_gather tlb; > > + mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, > + start, end); > + adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); > tlb_gather_mmu(&tlb, vma->vm_mm); > + > __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); Is there a reason for not using range.start and range.end? It is just that every inconsistency is worrying… > > @@ -1734,6 +1734,9 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr > lru_add_drain(); > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, > address, address + size); > + if (is_vm_hugetlb_page(vma)) > + adjust_range_if_pmd_sharing_possible(vma, &range.start, > + &range.end); > tlb_gather_mmu(&tlb, vma->vm_mm); > update_hiwater_rss(vma->vm_mm); > mmu_notifier_invalidate_range_start(&range); > @@ -1742,6 +1745,12 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr > tlb_finish_mmu(&tlb); > } > > +void zap_vma_range(struct vm_area_struct *vma, unsigned long address, > + unsigned long size) > +{ > + __zap_page_range_single(vma, address, size, NULL); Ugh. So zap_vma_range() would actually be emitted as a wrapper function that only calls __zap_page_range_single() (or worse __zap_page_range_single() which is large would be inlined), unless you use LTO. Another option is to declare __zap_page_range_size() in the header and move this one to the header as inline wrapper.