> The original approach was implemented in RFC v1, but the > implementation was broken: the way refcount was handled was wrong; it > was incremented once for each new page table mapping. (How? > find_lock_page(), called once per hugetlb_no_page/UFFDIO_CONTINUE > would increment refcount and we wouldn't drop it, and in > __unmap_hugepage_range(), the mmu_gather bits would decrement the > refcount once per mapping.) > > At the time, I figured the complexity of handling mapcount AND > refcount correctly in the original approach would be quite complex, so > I switched to the new one. Sorry I didn't make this clear... the following steps are how we could correctly implement the original approach. > 1. In places that already change the mapcount, check that we're > installing the hstate-level PTE, not a high-granularity PTE. Adjust > mapcount AND refcount appropriately. > 2. In the HGM walking bits, to the caller if we made the hstate-level > PTE present. (hugetlb_[pmd,pte]_alloc is the source of truth.) Need to > keep track of this until we figure out which page we're allocating > PTEs for, then change mapcount/refcount appropriately. > 3. In unmapping bits, change mmu_gather/tlb bits to drop refcount only > once per hugepage. (This is probably the hardest of these three things > to get right.)