On 2024/1/26 15:50, Muchun Song wrote: > > >> On Jan 26, 2024, at 04:28, Thorvald Natvig <thorvald@xxxxxxxxxx> wrote: >> >> We've found what appears to be a lock issue that results in a blocked >> process somewhere in hugetlbfs for shared maps; seemingly from an >> interaction between hugetlb_vm_op_open and hugetlb_vmdelete_list. >> >> Based on some added pr_warn, we believe the following is happening: >> When hugetlb_vmdelete_list is entered from the child process, >> vma->vm_private_data is NULL, and hence hugetlb_vma_trylock_write does >> not lock, since neither __vma_shareable_lock nor __vma_private_lock >> are true. >> >> While hugetlb_vmdelete_list is executing, the parent process does >> fork(), which ends up in hugetlb_vm_op_open, which in turn allocates a >> lock for the same vma. >> >> Thus, when the hugetlb_vmdelete_list in the child reaches the end of >> the function, vma->vm_private_data is now populated, and hence >> hugetlb_vma_unlock_write tries to unlock the vma_lock, which it does >> not hold. > > Thanks for your report. ->vm_private_data was introduced since the > series [1]. So I suspect it was caused by this. But I haven't reviewed > that at that time (actually, it is a little complex in pmd sharing > case). I saw Miaohe had reviewed many of those. > > CC Miaohe, maybe he has some ideas on this. > > [1] https://lore.kernel.org/all/20220914221810.95771-7-mike.kravetz@xxxxxxxxxx/T/#m2141e4bc30401a8ce490b1965b9bad74e7f791ff > > Thanks. > >> >> dmesg: >> WARNING: bad unlock balance detected! >> 6.8.0-rc1+ #24 Not tainted >> ------------------------------------- >> lock/2613 is trying to release lock (&vma_lock->rw_sema) at: >> [<ffffffffa94c6128>] hugetlb_vma_unlock_write+0x48/0x60 >> but there are no more locks to release! Thanks for your report. It seems there's a race: CPU 1 CPU 2 fork hugetlbfs_fallocate dup_mmap hugetlbfs_punch_hole i_mmap_lock_write(mapping); vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree. i_mmap_unlock_write(mapping); hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem! i_mmap_lock_write(mapping); hugetlb_vmdelete_list vma_interval_tree_foreach hugetlb_vma_trylock_write -- Vma_lock is cleared. tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem! hugetlb_vma_unlock_write -- Vma_lock is assigned!!! i_mmap_unlock_write(mapping); hugetlb_dup_vma_private and hugetlb_vm_op_open are called outside i_mmap_rwsem lock. So there will be another bugs behind it. But I'm not really sure. I will take a more closed look at next week. Thanks.