On 5/24/21 5:11 PM, Mina Almasry wrote: >>> + if (!HPageRestoreReserve(page)) { >>> + if (unlikely(hugetlb_unreserve_pages( >>> + mapping->host, idx, idx + 1, 1))) >>> + hugetlb_fix_reserve_counts( >>> + mapping->host); >>> + } >> >> I do not understand the need to call hugetlb_unreserve_pages(). The >> call to restore_reserve_on_error 'should' fix up the reserve map to >> align with restoring the reserve count in put_page/free_huge_page. >> Can you explain why that is there? >> > > AFAICT here is what happens for a given index *without* the call to > hugetlb_unreserve_pages(): > > 1. hugetlb_no_page() allocates a page consuming the reservation, > resv_huge_pages decrements. Ok > 2. remove_inode_hugepages() does remove_huge_page() and > hugetlb_unreserve_pages(). This removes the entry from the resv_map, > but does NOT increment back the resv_huge_pages. Because we removed > the entry, it looks like we have no reservation for this index. > free_huge_page() gets called on this page, and resv_huge_pages is not > incremented, I'm not sure why. This page should have come from the > reserves. This corresponds to a 'hole punch' operation. When hugetlbfs hole punch was added, a design decision was made to not try to restore reservations. IIRC, this is because the hole punch is associated with the file and reservations are associated with mappings. Trying to 'go back' to associated mappings and determine if a reservation should be restored would be a difficult exercise. > 3. hugetlb_mcopy_pte_atomic() gets called for this index. Because of > the prior call to hugetlb_unreserve_page(), there is no entry in the > resv_map for this index, which means it looks like we don't have a > reservation for this index. We allocate a page outside the reserves > (deferred_reservation=1, HPageRestoreReserve=0), add an entry into > resv_map, and don't modify resv_huge_pages. Ok > 4. The copy fails and we deallocate the page, since > HPageRestoreReserve==0 for this page, restore_reserve_on_error() does > nothing. Yes. And this may be something we want to fix in the more general case. i.e. I believe this can happen in code paths other than hugetlb_mcopy_pte_atomic. Rather than add special code here, I'll look into updating restore_reserve_on_error to handle this. Currently restore_reserve_on_error only handles the case where HPageRestoreReserve flags is set. Thanks for explaining this! I forgot about this 'special case' where there is not an entry in the reserve map for a shared mapping. -- Mike Kravetz > 5. hugetlb_mcopy_pte_atomic() gets recalled with the temporary page, > and we allocate another page. Now, since we added an entry in the > resv_map in the previous allocation, it looks like we have a > reservation for this allocation. We allocate a page with > deferred_reserve=0 && HPageRestoreReserve=1, we decrement > resv_huge_pages. Boom, we decremented resv_huge_pages twice for this > index, never incremented it. > > To fix this, in step 4, when I deallocate a page, I check > HPageRestoreReserve(page). If HPageRestoreReserve=0, then this > reservation was consumed and deallocated before, and so I need to > remove the entry from the resv_map.