On Tue, Feb 07, 2023 at 03:27:07PM -0800, James Houghton wrote: > So page_vma_mapped_walk() might have to walk up to HPAGE_PMD_NR-ish > PTEs (if we find a bunch of pte_none() PTEs). Just curious, could that > be any slower than what we currently do (like, incrementing up to > HPAGE_PMD_NR-ish subpage mapcounts)? Or is it not a concern? I think it's faster. Both of these operations work on folio_nr_pages() entries ... but a page table is 8 bytes and a struct page is 64 bytes. >From a CPU prefetching point of view, they're both linear scans, but PTEs are 8 times denser. The other factor to consider is how often we do each of these operations. Mapping a folio happens ~once per call to mmap() (even though it's delayed until page fault time). Querying folio_total_mapcount() happens ... less often, I think? Both are going to be quite rare since generally we map the entire folio at once.