On Wed, Nov 02, 2022 at 06:51:38PM -0700, Hugh Dickins wrote: > Compound page (folio) mapcount calculations have been different for > anon and file (or shmem) THPs, and involved the obscure PageDoubleMap > flag. And each huge mapping and unmapping of a file (or shmem) THP > involved atomically incrementing and decrementing the mapcount of every > subpage of that huge page, dirtying many struct page cachelines. > > Add subpages_mapcount field to the struct folio and first tail page, > so that the total of subpage mapcounts is available in one place near > the head: then page_mapcount() and total_mapcount() and page_mapped(), > and their folio equivalents, are so quick that anon and file and hugetlb > don't need to be optimized differently. Delete the unloved PageDoubleMap. > > page_add and page_remove rmap functions must now maintain the > subpages_mapcount as well as the subpage _mapcount, when dealing with > pte mappings of huge pages; and correct maintenance of NR_ANON_MAPPED > and NR_FILE_MAPPED statistics still needs reading through the subpages, > using nr_subpages_unmapped() - but only when first or last pmd mapping > finds subpages_mapcount raised (double-map case, not the common case). > > But are those counts (used to decide when to split an anon THP, and > in vmscan's pagecache_reclaimable heuristic) correctly maintained? > Not quite: since page_remove_rmap() (and also split_huge_pmd()) is > often called without page lock, there can be races when a subpage pte > mapcount 0<->1 while compound pmd mapcount 0<->1 is scanning - races > which the previous implementation had prevented. The statistics might > become inaccurate, and even drift down until they underflow through 0. > That is not good enough, but is better dealt with in a followup patch. > > Update a few comments on first and second tail page overlaid fields. > hugepage_add_new_anon_rmap() has to "increment" compound_mapcount, but > subpages_mapcount and compound_pincount are already correctly at 0, > so delete its reinitialization of compound_pincount. > > A simple 100 X munmap(mmap(2GB, MAP_SHARED|MAP_POPULATE, tmpfs), 2GB) > took 18 seconds on small pages, and used to take 1 second on huge pages, > but now takes 119 milliseconds on huge pages. Mapping by pmds a second > time used to take 860ms and now takes 92ms; mapping by pmds after mapping > by ptes (when the scan is needed) used to take 870ms and now takes 495ms. > But there might be some benchmarks which would show a slowdown, because > tail struct pages now fall out of cache until final freeing checks them. > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> Thanks for doing this! Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> And sorry again for PageDoubleMap() :/ Minor nitpick and a question below. > @@ -829,12 +829,20 @@ static inline int folio_entire_mapcount(struct folio *folio) > > /* > * Mapcount of compound page as a whole, does not include mapped sub-pages. > - * > - * Must be called only for compound pages. > + * Must be called only on head of compound page. > */ > -static inline int compound_mapcount(struct page *page) > +static inline int head_compound_mapcount(struct page *head) > { > - return folio_entire_mapcount(page_folio(page)); > + return atomic_read(compound_mapcount_ptr(head)) + 1; > +} > + > +/* > + * Sum of mapcounts of sub-pages, does not include compound mapcount. > + * Must be called only on head of compound page. > + */ > +static inline int head_subpages_mapcount(struct page *head) > +{ > + return atomic_read(subpages_mapcount_ptr(head)); > } > > /* Any particular reason these two do not take struct folio as an input? It would guarantee that it is non-tail page. It will not guarantee large-folio, but it is something. > @@ -1265,8 +1288,6 @@ void page_add_new_anon_rmap(struct page *page, > VM_BUG_ON_PAGE(!PageTransHuge(page), page); > /* increment count (starts at -1) */ > atomic_set(compound_mapcount_ptr(page), 0); > - atomic_set(compound_pincount_ptr(page), 0); > - It has to be initialized to 0 on allocation, right? > __mod_lruvec_page_state(page, NR_ANON_THPS, nr); > } else { > /* increment count (starts at -1) */ -- Kiryl Shutsemau / Kirill A. Shutemov