On Thu, Nov 07, 2024 at 03:18:51PM -0500, Peter Xu wrote: > +Ackerley +Oscar > > I'm reading the resv code recently and just stumbled upon this. So want to > raise this question. > > IIUC __vma_private_lock() will return false for MAP_PRIVATE hugetlb vma if > the vma is dup()ed from a fork(), with/without commit 187da0f8250a > ("hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write") which fixed a > slightly different issue. > > The problem is the current vma lock for private mmap() is based on the resv > map, and the resv map only belongs to the process that mmap()ed this > private vma. E.g. dup_mmap() has: > > if (is_vm_hugetlb_page(tmp)) > hugetlb_dup_vma_private(tmp); > > Which does: > > if (vma->vm_flags & VM_MAYSHARE) { > ... > } else > vma->vm_private_data = NULL; <--------------------- > > So even if I don't know how many of us are even using hugetlb PRIVATE + > fork(), assuming that's the most controversial use case that I'm aware of > on hugetlb that people complains about.. with some tricky changes like > 04f2cbe35699.. Just still want to raise this pure question, that after a > fork() on private vma, and if I read it alright, lock/unlock operations may > become noop.. I have been taking a look at this, and yes, __vma_private_lock will return false for private hugetlb mappings that were forked . I quickly checked what protects what and we currently have: hugetlb_vma_lock_read - copy_hugetlb_page_range (only sharing) hugetlb_vma_lock_read - hugetlb_wp (only for HPAGE_RESV_OWNER) hugetlb_vma_lock_read - hugetlb_fault , protects huge_pmd_unshare? hugetlb_vma_lock_read - pagewalks hugetlb_vma_lock_write - hugetlb_change_protection hugetlb_vma_lock_write - hugetlb_unshare_pmds hugetlb_vma_lock_wirte - move_hugetlb_page_tables hugetlb_vma_lock_wirte - _hugetlb_zap_begin (unmap_vmas) the ones taking the hugetlb_vma_lock in write (so, the last four) also take the i_mmap_lock_write (vma->vm_file->f_mapping), and AFAIK, hugetlb mappings, private or not, should have vma->vm_file->f_mapping set. Which means that technically we cannot race between hugetlb_change_protection and move_hugetlb_page_tables etc. But, checking commit bf4916922c60f43efaa329744b3eef539aa6a2b2 Author: Rik van Riel <riel@xxxxxxxxxxx> Date: Thu Oct 5 23:59:07 2023 -0400 hugetlbfs: extend hugetlb_vma_lock to private VMAs which its motivation was to protect MADV_DONTNEED vs page_faults, I do not see how it gets protected with private hugetlb mappings that were dupped (forked). madvise_dontneed_single_vma zap_page_range_single _hugetlb_zap_begin hugetlb_vma_lock_write - noop for mappings that do not own the reservation i_mmap_lock_write But the hugetlb_fault path only takes hugetlb_vma_lock_*, so theorically we still could race between page_fault vs madvise_dontneed_single_vma? A quick way to prove would be map a hugetlb private mapping, fork it and have two threads tryong to madvise(MADV_DONTNEED) and the other trying to write to it? I do not know, maybe we are protected in some other way I cannot see right now. I will have a look. -- Oscar Salvador SUSE Labs