Re: [PATCH v5 4/6] mm: shmem: add mTHP support for anonymous shmem

Ryan Roberts <ryan.roberts@xxxxxxx> · Fri, 5 Jul 2024 10:05:32 +0100

On 05/07/2024 09:57, Baolin Wang wrote:
> 
> 
> On 2024/7/5 16:42, Ryan Roberts wrote:
>> On 05/07/2024 04:01, Baolin Wang wrote:
>>>
>>>
>>> On 2024/7/4 22:46, Bang Li wrote:
>>>> Hi Bao lin,
>>>>
>>>> On 2024/7/4 19:15, Baolin Wang wrote:
>>>>>
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Only allow inherit orders if the top-level value is 'force', which
>>>>>>> +     * means non-PMD sized THP can not override 'huge' mount option now.
>>>>>>> +     */
>>>>>>> +    if (shmem_huge == SHMEM_HUGE_FORCE)
>>>>>>> +        return READ_ONCE(huge_shmem_orders_inherit);
>>>>>>
>>>>>> I vaguely recall that we originally discussed that trying to set 'force'
>>>>>> on the
>>>>>> top level control while any per-size controls were set to 'inherit' would
>>>>>> be an
>>>>>> error, and trying to set 'force' on any per-size control except the PMD-size
>>>>>> would be an error?
>>>>>
>>>>> Right.
>>>>>
>>>>>> I don't really understand this logic. Shouldn't we just be looking at the
>>>>>> per-size control settings (or the top-level control as a proxy for every
>>>>>> per-size control that has 'inherit' set)?
>>>>>
>>>>> ‘force’ will apply the huge orders for anon shmem and tmpfs, so now we only
>>>>> allow pmd-mapped THP to be forced. We should not look at per-size control
>>>>> settings for tmpfs now (mTHP for tmpfs will be discussed in future).
>>>>>
>>>>>>
>>>>>> Then for tmpfs, which doesn't support non-PMD-sizes yet, we just always
>>>>>> use the
>>>>>> PMD-size control for decisions.
>>>>>>
>>>>>> I'm also really struggling with the concept of shmem_is_huge() existing along
>>>>>> side shmem_allowable_huge_orders(). Surely this needs to all be refactored
>>>>>> into
>>>>>> shmem_allowable_huge_orders()?
>>>>>
>>>>> I understood. But now they serve different purposes: shmem_is_huge() will be
>>>>> used to check the huge orders for the top level, for *tmpfs* and anon shmem;
>>>>> whereas shmem_allowable_huge_orders() will only be used to check the per-size
>>>>> huge orders for anon shmem (excluding tmpfs now). However, as I plan to add
>>>>> mTHP support for tmpfs, I think we can perform some cleanups.
>>>>
>>>> Please count me in, I'd be happy to contribute to the cleanup and enhancement
>>>> process if I can.
>>>
>>> Good. If you have time, I think you can look at the shmem khugepaged issue from
>>> the previous discussion [1], which I don't have time to look at now.
>>>
>>> "
>>> (3) khugepaged
>>>
>>> khugepaged needs to handle larger folios properly as well. Until fixed,
>>> using smaller THP sizes as fallback might prohibit collapsing a
>>> PMD-sized THP later. But really, khugepaged needs to be fixed to handle
>>> that.
>>> "
>>
>> khugepaged can already collapse "folios of any order less then PMD-size" to
>> PMD-size, for anon memory. Infact I modified the selftest to be able to test
>> that in commit 9f0704eae8a4 ("selftests/mm/khugepaged: enlighten for multi-size
>> THP"). I'd be surprised if khugepaged can't alreay handle the same for shmem?
> 
> I did not test this, but from the comment in hpage_collapse_scan_file(), seems
> that compacting small mTHP into a single PMD-mapped THP is not supported yet.
> 
> /*
>          * TODO: khugepaged should compact smaller compound pages
>          * into a PMD sized page
>          */
>         if (folio_test_large(folio)) {
>             result = folio_order(folio) == HPAGE_PMD_ORDER &&
>                     folio->index == start
>                     /* Maybe PMD-mapped */
>                     ? SCAN_PTE_MAPPED_HUGEPAGE
>                     : SCAN_PAGE_COMPOUND;
>             /*
>              * For SCAN_PTE_MAPPED_HUGEPAGE, further processing
>              * by the caller won't touch the page cache, and so
>              * it's safe to skip LRU and refcount checks before
>              * returning.
>              */
>             break;
>         }

OK, so the functionality is missing just for shmem/file-backed collapse. Got it.
Sorry for the noise.

> 
>> Although the test will definitely want to be extended to test it.
> 
> Right.