Hi All, As discussed at Matthew's call yesterday evening, I've put together a list of items that need to be done as prerequisites for merging large anonymous folios support. It would be great to get some review and confirmation as to whether anything is missing or incorrect. Most items have an assignee - in that case it would be good to check that my understanding that you are working on the item is correct. I think most things are independent, with the exception of "shared vs exclusive mappings", which I think becomes a dependency for a couple of things (marked in depender description); again would be good to confirm. Finally, although I'm concentrating on the prerequisites to clear the path for merging an MVP Large Anon Folios implementation, I've included one "enhancement" item ("large folios in swap cache"), solely because we explicitly discussed it last night. My view is that enhancements can come after the initial large anon folios merge. Over time, I plan to add other enhancements (e.g. retain large folios over COW, etc). I'm posting the table as yaml as that seemed easiest for email. You can convert to csv with something like this in Python: import yaml import pandas as pd pd.DataFrame(yaml.safe_load(open('work-items.yml'))).to_csv('work-items.csv') Thanks, Ryan ----- - item: shared vs exclusive mappings priority: prerequisite description: >- New mechanism to allow us to easily determine precisely whether a given folio is mapped exclusively or shared between multiple processes. Required for (from David H): (1) Detecting shared folios, to not mess with them while they are shared. MADV_PAGEOUT, user-triggered page migration, NUMA hinting, khugepaged ... replace cases where folio_estimated_sharers() == 1 would currently be the best we can do (and in some cases, page_mapcount() == 1). (2) COW improvements for PTE-mapped large anon folios after fork(). Before fork(), PageAnonExclusive would have been reliable, after fork() it's not. For (1), "MADV_PAGEOUT" maps to the "madvise" item captured in this list. I *think* "NUMA hinting" maps to "numa balancing" (but need confirmation!). "user-triggered page migration" and "khugepaged" not yet captured (would appreciate someone fleshing it out). I previously understood migration to be working for large folios - is "user-triggered page migration" some specific aspect that does not work? For (2), this relates to Large Anon Folio enhancements which I plan to tackle after we get the basic series merged. links: - 'email thread: Mapcount games: "exclusive mapped" vs. "mapped shared"' location: - shrink_folio_list() assignee: David Hildenbrand <david@xxxxxxxxxx> - item: compaction priority: prerequisite description: >- Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for page-cache pages today. links: - https://lore.kernel.org/linux-mm/ZKgPIXSrxqymWrsv@xxxxxxxxxxxxxxxxxxxx/ - https://lore.kernel.org/linux-mm/C56EA745-E112-4887-8C22-B74FCB6A14EB@xxxxxxxxxx/ location: - compaction_alloc() assignee: Zi Yan <ziy@xxxxxxxxxx> - item: mlock priority: prerequisite description: >- Large, pte-mapped folios are ignored when mlock is requested. Code comment for mlock_vma_folio() says "...filter out pte mappings of THPs, which cannot be consistently counted: a pte mapping of the THP head cannot be distinguished by the page alone." location: - mlock_pte_range() - mlock_vma_folio() links: - https://lore.kernel.org/linux-mm/20230712060144.3006358-1-fengwei.yin@xxxxxxxxx/ assignee: Yin, Fengwei <fengwei.yin@xxxxxxxxx> - item: madvise priority: prerequisite description: >- MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes exclusive only if mapcount==1, else skips remainder of operation. For large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages and still be exclusive. Even better; don't split the folio if it fits entirely within the range. Likely depends on "shared vs exclusive mappings". links: - https://lore.kernel.org/linux-mm/20230713150558.200545-1-fengwei.yin@xxxxxxxxx/ location: - madvise_cold_or_pageout_pte_range() - madvise_free_pte_range() assignee: Yin, Fengwei <fengwei.yin@xxxxxxxxx> - item: deferred_split_folio priority: prerequisite description: >- zap_pte_range() will remove each page of a large folio from the rmap, one at a time, causing the rmap code to see the folio as partially mapped and call deferred_split_folio() for it. Then it subsquently becmes fully unmapped and it is removed from the queue. This can cause some lock contention. Proposed fix is to modify to zap_pte_range() to "batch zap" a whole pte range that corresponds to a folio to avoid the unneccessary deferred_split_folio() call. links: - https://lore.kernel.org/linux-mm/20230719135450.545227-1-ryan.roberts@xxxxxxx/ location: - zap_pte_range() assignee: Ryan Roberts <ryan.roberts@xxxxxxx> - item: numa balancing priority: prerequisite description: >- Large, pte-mapped folios are ignored by numa-balancing code. Commit comment (e81c480): "We're going to have THP mapped with PTEs. It will confuse numabalancing. Let's skip them for now." Likely depends on "shared vs exclusive mappings". links: [] location: - do_numa_page() assignee: <none> - item: large folios in swap cache priority: enhancement description: >- shrink_folio_list() currently splits large folios to single pages before adding them to the swap cache. It would be preferred to add the large folio as an atomic unit to the swap cache. It is still expected that each page would use a separate swap entry when swapped out. This represents an efficiency improvement. There is risk that this change will expose bad assumptions in the swap cache that assume any large folio is pmd-mappable. links: - https://lore.kernel.org/linux-mm/CAOUHufbC76OdP16mRsY3i920qB7khcu8FM+nUOG0kx5BMRdKXw@xxxxxxxxxxxxxx/ location: - shrink_folio_list() assignee: <none> -----