On 6 Jun 2024, at 14:33, Barry Song wrote: > On Fri, Jun 7, 2024 at 9:24 AM Zi Yan <ziy@xxxxxxxxxx> wrote: >> >> On 6 Jun 2024, at 14:00, Barry Song wrote: >> >>> On Fri, Jun 7, 2024 at 2:35 AM Zi Yan <ziy@xxxxxxxxxx> wrote: >>>> >>>> +Matthew >>>> >>>> For mapping_large_folio_support() changes. >>>> >>>> On 6 Jun 2024, at 2:42, xu.xin16@xxxxxxxxxx wrote: >>>> >>>>> From: Ran Xiaokai <ran.xiaokai@xxxxxxxxxx> >>>>> >>>>> When I did a large folios split test, a WARNING >>>>> "[ 5059.122759][ T166] Cannot split file folio to non-0 order" >>>>> was triggered. But the test cases are only for anonmous folios. >>>>> while mapping_large_folio_support() is only reasonable for page >>>>> cache folios. >>>>> >>>>> In split_huge_page_to_list_to_order(), the folio passed to >>>>> mapping_large_folio_support() maybe anonmous folio. The >>>>> folio_test_anon() check is missing. So the split of the anonmous THP >>>>> is failed. This is also the same for shmem_mapping(). We'd better add >>>>> a check for both. But the shmem_mapping() in __split_huge_page() is >>>>> not involved, as for anonmous folios, the end parameter is set to -1, so >>>>> (head[i].index >= end) is always false. shmem_mapping() is not called. >>>>> >>>>> Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() >>>>> for anon mapping, So we can detect the wrong use more easily. >>>>> >>>>> THP folios maybe exist in the pagecache even the file system doesn't >>>>> support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE >>>>> is enabled, khugepaged will try to collapse read-only file-backed pages >>>>> to THP. But the mapping does not actually support multi order >>>>> large folios properly. >>>>> >>>>> Using /sys/kernel/debug/split_huge_pages to verify this, with this >>>>> patch, large anon THP is successfully split and the warning is ceased. >>>>> >>>>> Signed-off-by: Ran Xiaokai <ran.xiaokai@xxxxxxxxxx> >>>>> --- >>>>> include/linux/pagemap.h | 4 ++++ >>>>> mm/huge_memory.c | 27 ++++++++++++++++----------- >>>>> 2 files changed, 20 insertions(+), 11 deletions(-) >>>>> >>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >>>>> index ee633712bba0..59f1df0cde5a 100644 >>>>> --- a/include/linux/pagemap.h >>>>> +++ b/include/linux/pagemap.h >>>>> @@ -381,6 +381,10 @@ static inline void mapping_set_large_folios(struct address_space *mapping) >>>>> */ >>>>> static inline bool mapping_large_folio_support(struct address_space *mapping) >>>>> { >>>>> + /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */ >>>>> + VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON, >>>>> + "Anonymous mapping always supports large folio"); >>>>> + >>>>> return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && >>>>> test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); >>>>> } >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>> index 317de2afd371..62d57270b08e 100644 >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -3009,30 +3009,35 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >>>>> if (new_order >= folio_order(folio)) >>>>> return -EINVAL; >>>>> >>>>> - /* Cannot split anonymous THP to order-1 */ >>>>> - if (new_order == 1 && folio_test_anon(folio)) { >>>>> - VM_WARN_ONCE(1, "Cannot split to order-1 folio"); >>>>> - return -EINVAL; >>>>> - } >>>>> - >>>>> - if (new_order) { >>>>> - /* Only swapping a whole PMD-mapped folio is supported */ >>>>> - if (folio_test_swapcache(folio)) >>>>> + if (folio_test_anon(folio)) { >>>>> + /* Cannot split anonymous THP to order-1 */ >>>>> + if (new_order == 1) { >>>>> + VM_WARN_ONCE(1, "Cannot split to order-1 folio"); >>>>> return -EINVAL; >>>>> + } >>>>> + } else if (new_order) { >>>>> /* Split shmem folio to non-zero order not supported */ >>>>> if (shmem_mapping(folio->mapping)) { >>>>> VM_WARN_ONCE(1, >>>>> "Cannot split shmem folio to non-0 order"); >>>>> return -EINVAL; >>>>> } >>>>> - /* No split if the file system does not support large folio */ >>>>> - if (!mapping_large_folio_support(folio->mapping)) { >>>>> + /* No split if the file system does not support large folio. >>>>> + * Note that we might still have THPs in such mappings due to >>>>> + * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping >>>>> + * does not actually support large folios properly. >>>>> + */ >>>>> + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && >>>>> + !mapping_large_folio_support(folio->mapping)) { >>>> >>>> Shouldn’t this be >>>> >>>> if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && >>>> !mapping_large_folio_support(folio->mapping)) { >>>> >>>> ? >>>> >>>> When CONFIG_READ_ONLY_THP_FOR_FS is not set, we need to check >>>> mapping_large_folio_support(), otherwise we do not. >>> >>> while CONFIG_READ_ONLY_THP_FOR_FS is not set, that is no way >>> a large folio can be mapped to a filesystem which doesn't support >>> large folio mapping. i think >> >> That is why we have the warning below to catch this undesired >> case. >> >>> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) is correct. >> >> When it is set, khugepaged can create a large pagecache folio >> on a filesystem without large folio support and the warning >> will be triggered once the created large pagecache folio >> is split. That is not what we want. > > yes. This is exactly why we need if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) > but not if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) . > > because if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)), folio is definitely > pointing to a file system supporting large folio. otherwise, it is a bug. Oh, got it. Thanks for the explanation. :) >> >>> >>> The below means a BUG which has never a chance to happen if it >>> is true. >>> >>> !IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && >>> !mapping_large_folio_support(folio->mapping)); >>> >>>> >>>>> VM_WARN_ONCE(1, >>>>> "Cannot split file folio to non-0 order"); >>>>> return -EINVAL; >>>>> } >>>>> } >>>>> >>>>> + /* Only swapping a whole PMD-mapped folio is supported */ >>>>> + if (folio_test_swapcache(folio) && new_order) >>>>> + return -EINVAL; >>>>> >>>>> is_hzp = is_huge_zero_folio(folio); >>>>> if (is_hzp) { >>>>> -- >>>>> 2.15.2 >>>> >>>> >>>> Best Regards, >>>> Yan, Zi >>> >>> Thanks >>> Barry >> >> >> Best Regards, >> Yan, Zi Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature