Re: hugetlbfs: WARNING: bad unlock balance detected during MADV_REMOVE

Muchun Song <muchun.song@xxxxxxxxx> · Fri, 26 Jan 2024 15:50:23 +0800

> On Jan 26, 2024, at 04:28, Thorvald Natvig <thorvald@xxxxxxxxxx> wrote:
> 
> We've found what appears to be a lock issue that results in a blocked
> process somewhere in hugetlbfs for shared maps; seemingly from an
> interaction between hugetlb_vm_op_open and hugetlb_vmdelete_list.
> 
> Based on some added pr_warn, we believe the following is happening:
> When hugetlb_vmdelete_list is entered from the child process,
> vma->vm_private_data is NULL, and hence hugetlb_vma_trylock_write does
> not lock, since neither __vma_shareable_lock nor __vma_private_lock
> are true.
> 
> While hugetlb_vmdelete_list is executing, the parent process does
> fork(), which ends up in hugetlb_vm_op_open, which in turn allocates a
> lock for the same vma.
> 
> Thus, when the hugetlb_vmdelete_list in the child reaches the end of
> the function, vma->vm_private_data is now populated, and hence
> hugetlb_vma_unlock_write tries to unlock the vma_lock, which it does
> not hold.

Thanks for your report. ->vm_private_data was introduced since the
series [1]. So I suspect it was caused by this. But I haven't reviewed
that at that time (actually, it is a little complex in pmd sharing
case). I saw Miaohe had reviewed many of those.

CC Miaohe, maybe he has some ideas on this.

[1] https://lore.kernel.org/all/20220914221810.95771-7-mike.kravetz@xxxxxxxxxx/T/#m2141e4bc30401a8ce490b1965b9bad74e7f791ff

Thanks.

> 
> dmesg:
> WARNING: bad unlock balance detected!
> 6.8.0-rc1+ #24 Not tainted
> -------------------------------------
> lock/2613 is trying to release lock (&vma_lock->rw_sema) at:
> [<ffffffffa94c6128>] hugetlb_vma_unlock_write+0x48/0x60
> but there are no more locks to release!
> 
> 
> 3 locks held by lock/2613:
> #0: ffff9b4bc6225450 (sb_writers#16){.+.+}-{0:0}, at:
> madvise_vma_behavior+0x4cc/0xcf0
> #1: ffff9ba4dc34eca0 (&sb->s_type->i_mutex_key#23){+.+.}-{3:3}, at:
> hugetlbfs_fallocate+0x3fe/0x620
> #2: ffff9ba4dc34ef38 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{3:3}, at:
> hugetlbfs_fallocate+0x438/0x620
> 
> 
> CPU: 17 PID: 2613 Comm: lock Not tainted 6.8.0-rc1+ #24
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 12/02/2023
> Call Trace:
> <TASK>
> dump_stack_lvl+0x77/0xe0
> ? hugetlb_vma_unlock_write+0x48/0x60
> dump_stack+0x10/0x20
> print_unlock_imbalance_bug+0x127/0x150
> lock_release+0x21a/0x3f0
> ? hugetlb_vma_unlock_write+0x48/0x60
> up_write+0x1c/0x1d0
> hugetlb_vma_unlock_write+0x48/0x60
> hugetlb_vmdelete_list+0x93/0xd0
> hugetlbfs_fallocate+0x4e1/0x620
> vfs_fallocate+0x153/0x4b0
> madvise_vma_behavior+0x4cc/0xcf0
> ? mas_prev+0x68/0x70
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? find_vma_prev+0x78/0xc0
> ? __pfx_madvise_vma_behavior+0x10/0x10
> madvise_walk_vmas+0xc4/0x140
> do_madvise+0x3df/0x450
> __x64_sys_madvise+0x2c/0x40
> do_syscall_64+0x8e/0x160
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_syscall_64+0x9b/0x160
> ? do_syscall_64+0x9b/0x160
> ? do_syscall_64+0x9b/0x160
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
> RIP: 0033:0x7f55e0b23bbb
> 
> Repro:
> 
> #include <signal.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> #include <unistd.h>
> 
> #define PSIZE (2048UL * 1024UL)
> 
> int main(int argc, char **argv) {
>  char *buffer = mmap(NULL, PSIZE, PROT_READ | PROT_WRITE,
> MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB, -1, 0);
>  if (buffer == MAP_FAILED) {
>    perror("mmap");
>    exit(1);
>  }
> 
>  pid_t remover = fork();
> 
>  if (remover == 0) {
>    while(1) {
>      if (madvise(buffer, PSIZE, MADV_REMOVE) == -1) {
>        perror("madvise");
>        exit(1);
>      }
>    }
>  }
> 
>  int wstatus;
> 
>  for(int l = 0; l < 10000; ++l) {
>    pid_t childpid = fork();
>    if (childpid == 0) {
>      exit(0);
>    } else {
>      waitpid(childpid, &wstatus, 0);
>    }
>  }
> 
>  kill(remover, SIGKILL);
>  waitpid(remover, &wstatus, 0);
>  printf("Clean exit\n");
> }
> 
> - Thorvald