Peter Xu <peterx@xxxxxxxxxx> writes: > Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a > process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"), > avoid_reserve was introduced for a special case of CoW on hugetlb private > mappings, and only if the owner VMA is trying to allocate yet another > hugetlb folio that is not reserved within the private vma reserved map. > > Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas > hole punched by fallocate"), alloc_huge_page() enforced to not consume any > global reservation as long as avoid_reserve=true. This operation doesn't > look correct, because even if it will enforce the allocation to not use > global reservation at all, it will still try to take one reservation from > the spool (if the subpool existed). Then since the spool reserved pages > take from global reservation, it'll also take one reservation globally. > > Logically it can cause global reservation to go wrong. > > I wrote a reproducer below Thank you so much for looking into this! > <snip> I was able to reproduce this using your reproducer. /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages is not decremented even after the reproducer exits. # sysctl vm.nr_hugepages=16 vm.nr_hugepages = 16 # mkdir ./hugetlb-pool # mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool # for i in $(seq 16); do ./a.out hugetlb-pool/test; cat /sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages; done 5 6 7 8 9 10 11 12 13 14 15 16 16 16 16 16 # I'll go over the rest of your patches and dig into the meaning of `avoid_reserve`.