Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory

Ryan Roberts <ryan.roberts@xxxxxxx> · Thu, 6 Jul 2023 09:02:30 +0100

On 05/07/2023 20:38, David Hildenbrand wrote:
> On 03.07.23 15:53, Ryan Roberts wrote:
>> Hi All,
>>
>> This is v2 of a series to implement variable order, large folios for anonymous
>> memory. The objective of this is to improve performance by allocating larger
>> chunks of memory during anonymous page faults. See [1] for background.
>>

[...]

>> Thanks,
>> Ryan
> 
> Hi Ryan,
> 
> is page migration already working as expected (what about page compaction?), and
> do we handle migration -ENOMEM when allocating a target page: do we split an
> fallback to 4k page migration?
> 

Hi David, All,

This series aims to be the bare minimum to demonstrate allocation of large anon
folios. As such, there is a laundry list of things that need to be done for this
feature to play nicely with other features. My preferred route is to merge this
with it's Kconfig defaulted to disabled, and its Kconfig description clearly
shouting that it's EXPERIMENTAL with an explanation of why (similar to
READ_ONLY_THP_FOR_FS).

That said, I've put together a table of the items that I'm aware of that need
attention. It would be great if people can review and add any missing items.
Then we can hopefully parallelize the implementation work. David, I don't think
the items you raised are covered - would you mind providing a bit more detail so
I can add them to the list? (or just add them to the list yourself, if you prefer).

---

- item:
    mlock

  description: >-
    Large, pte-mapped folios are ignored when mlock is requested. Code comment
    for mlock_vma_folio() says "...filter out pte mappings of THPs, which
    cannot be consistently counted: a pte mapping of the THP head cannot be
    distinguished by the page alone."

  location:
    - mlock_pte_range()
    - mlock_vma_folio()

  assignee:
    Yin, Fengwei

- item:
    numa balancing

  description: >-
    Large, pte-mapped folios are ignored by numa-balancing code. Commit
    comment (e81c480): "We're going to have THP mapped with PTEs. It will
    confuse numabalancing. Let's skip them for now."

  location:
    - do_numa_page()

  assignee:
    <none>

- item:
    madvise

  description: >-
    MADV_COLD, MADV_PAGEOUT, MADV_FREE: For large folios, code assumes
    exclusive only if mapcount==1, else skips remainder of operation. For
    large, pte-mapped folios, exclusive folios can have mapcount upto nr_pages
    and still be exclusive. Even better; don't split the folio if it fits
    entirely within the range? Discussion at

https://lore.kernel.org/linux-mm/6cec6f68-248e-63b4-5615-9e0f3f819a0a@xxxxxxxxxx/
    talks about changing folio mapcounting - may help determine if exclusive
    without pgtable scan?

  location:
    - madvise_cold_or_pageout_pte_range()
    - madvise_free_pte_range()

  assignee:
    <none>

- item:
    shrink_folio_list

  description: >-
    Raised by Yu Zhao; I can't see the problem in the code - need
    clarification

  location:
    - shrink_folio_list()

  assignee:
    <none>

- item:
    compaction

  description: >-
    Raised at LSFMM: Compaction skips non-order-0 pages. Already problem for
    page-cache pages today. Is my understand correct?

  location:
    - <where?>

  assignee:
    <none>
---

Thanks,
Ryan