On 09/07/2024 09:13, Barry Song wrote: > On Tue, Jul 9, 2024 at 7:55 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> On 09/07/2024 02:44, Barry Song wrote: >>> On Tue, Jul 9, 2024 at 12:30 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >>>> >>>> On 08/07/2024 12:36, Barry Song wrote: >>>>> On Mon, Jul 8, 2024 at 11:24 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >>>>>> >>>>>> The legacy PMD-sized THP counters at /proc/vmstat include >>>>>> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge, which >>>>>> rather confusingly refer to shmem THP and do not include any other types >>>>>> of file pages. This is inconsistent since in most other places in the >>>>>> kernel, THP counters are explicitly separated for anon, shmem and file >>>>>> flavours. However, we are stuck with it since it constitutes a user ABI. >>>>>> >>>>>> Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for >>>>>> anonymous shmem") added equivalent mTHP stats for shmem, keeping the >>>>>> same "file_" prefix in the names. But in future, we may want to add >>>>>> extra stats to cover actual file pages, at which point, it would all >>>>>> become very confusing. >>>>>> >>>>>> So let's take the opportunity to rename these new counters "shmem_" >>>>>> before the change makes it upstream and the ABI becomes immutable. >>>>> >>>>> Personally, I think this approach is much clearer. However, I recall >>>>> we discussed this >>>>> before [1], and it seems that inconsistency is a concern? >>>> >>>> Embarrassingly, I don't recall that converstation at all :-| but at least what I >>>> said then is consistent with what I've done in this patch. >>>> >>>> I think David's conclusion from that thread was to call them FILE_, and add both >>>> shmem and pagecache counts to those counters, meaning we can keep the same name >>>> as legacy THP counters. But those legacy THP counters only count shmem, and I >>>> don't think we would get away with adding pagecache counts to those at this >>>> point? (argument: they have been around for long time and there is a risk that >>>> user space relies on them and if they were to dramatically increase due to >>>> pagecache addition now that could break things). In that case, there is still >>>> inconsistency, but its worse; the names are consistent but the semantics are >>>> inconsistent. >>>> >>>> So my vote is to change to SHMEM_ as per this patch :) >>> >>> I have no objections. However, I dislike the documentation for >>> thp_file_*. Perhaps we can clean it all up together ? >> >> I agree that we should clean this documentation up and I'm happy to roll it into >> v2. However, I don't think what you have suggested is quite right. >> >> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge *only* count >> shmem. They don't count pagecache. So perhaps the change should be "...every >> time a shmem huge page (dispite being named after "file", the counter measures >> only shmem) is..."? > > I understand what you are saying, and I know that thp_file_* has only > included shmem so far. My question is whether it will include regular > files in the future? If not, I am perfectly fine with your approach. My whole reasoning for this patch is based on the assertion that since THP_FILE_ALLOC has been there for 8 years and in all that time has only counted shmem, then its highly likely that someone is depending on that semantic and we can't change it. I don't have any actual evidence of code that relies on it though. I propose I change the docs to reflect what's actually happening today (i.e. shmem *only*). If we later decide we want to also report page cache numbers through that same counter, then we can change the docs at that point. But if I get my way, we'll soon have mTHP counters for FILE, which is solely for page cache. So You'll be able to get all the fine-grained info out of those and there will be no need to mess with the legacy counters. > > READ_ONLY_THP_FOR_FS isn't applicable in this path as it is created > by khugepaged collapse. > >> >> thp_file_mapped includes both file and shmem, so agree with your change there. >> >> What do you think? >> >> >>> >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>> index 709fe10b60f4..65df48cb3bbb 100644 >>> --- a/Documentation/admin-guide/mm/transhuge.rst >>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>> @@ -417,21 +417,22 @@ thp_collapse_alloc_failed >>> the allocation. >>> >>> thp_file_alloc >>> - is incremented every time a file huge page is successfully >>> - allocated. >>> + is incremented every time a file (including shmem) huge page is >>> + successfully allocated. >>> >>> thp_file_fallback >>> - is incremented if a file huge page is attempted to be allocated >>> - but fails and instead falls back to using small pages. >>> + is incremented if a file (including shmem) huge page is attempted >>> + to be allocated but fails and instead falls back to using small >>> + pages. >>> >>> thp_file_fallback_charge >>> - is incremented if a file huge page cannot be charged and instead >>> - falls back to using small pages even though the allocation was >>> - successful. >>> + is incremented if a file (including shmem) huge page cannot be >>> + charged and instead falls back to using small pages even though >>> + the allocation was successful. >>> >>> thp_file_mapped >>> - is incremented every time a file huge page is mapped into >>> - user address space. >>> + is incremented every time a file (including shmem) huge page is >>> + mapped into user address space. >>> >>> thp_split_page >>> is incremented every time a huge page is split into base >>> >>>> >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/05d0096e4ec3e572d1d52d33a31a661321ac1551.1713755580.git.baolin.wang@xxxxxxxxxxxxxxxxx/ >>>>> >>>>> >>>>>> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>>>>> --- >>>>>> >>>>>> Hi All, >>>>>> >>>>>> Applies on top of today's mm-unstable (2073cda629a4) and tested with mm >>>>>> selftests; no regressions observed. >>>>>> >>>>>> The backstory here is that I'd like to introduce some counters for regular file >>>>>> folio allocations to observe how often large folio allocation succeeds, but >>>>>> these shmem counters are named "file" which is going to make things confusing. >>>>>> So hoping to solve that before commit 66f44583f9b6 ("mm: shmem: add mTHP >>>>>> counters for anonymous shmem") goes upstream (it is currently in mm-stable). >>>>>> >>>>>> Admittedly, this change means the mTHP stat names are not the same as the legacy >>>>>> PMD-size THP names, but I think that's a smaller issue than having "file_" mTHP >>>>>> stats that only count shmem, then having to introduce "file2_" or "pgcache_" >>>>>> stats for the regular file memory, which is even more inconsistent IMHO. I guess >>>>>> the alternative is to count both shmem and file in these mTHP stats (that's how >>>>>> they were documented anyway) but I think it's better to be able to consider them >>>>>> separately like we do for all the other counters. >>>>>> >>>>>> Thanks, >>>>>> Ryan >>>>>> >>>>>> Documentation/admin-guide/mm/transhuge.rst | 12 ++++++------ >>>>>> include/linux/huge_mm.h | 6 +++--- >>>>>> mm/huge_memory.c | 12 ++++++------ >>>>>> mm/shmem.c | 8 ++++---- >>>>>> 4 files changed, 19 insertions(+), 19 deletions(-) >>>>>> >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>>>>> index 747c811ee8f1..8b891689fc13 100644 >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>>>>> @@ -496,16 +496,16 @@ swpout_fallback >>>>>> Usually because failed to allocate some continuous swap space >>>>>> for the huge page. >>>>>> >>>>>> -file_alloc >>>>>> - is incremented every time a file huge page is successfully >>>>>> +shmem_alloc >>>>>> + is incremented every time a shmem huge page is successfully >>>>>> allocated. >>>>>> >>>>>> -file_fallback >>>>>> - is incremented if a file huge page is attempted to be allocated >>>>>> +shmem_fallback >>>>>> + is incremented if a shmem huge page is attempted to be allocated >>>>>> but fails and instead falls back to using small pages. >>>>>> >>>>>> -file_fallback_charge >>>>>> - is incremented if a file huge page cannot be charged and instead >>>>>> +shmem_fallback_charge >>>>>> + is incremented if a shmem huge page cannot be charged and instead >>>>>> falls back to using small pages even though the allocation was >>>>>> successful. >>>>>> >>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>>> index acb6ac24a07e..cff002be83eb 100644 >>>>>> --- a/include/linux/huge_mm.h >>>>>> +++ b/include/linux/huge_mm.h >>>>>> @@ -269,9 +269,9 @@ enum mthp_stat_item { >>>>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, >>>>>> MTHP_STAT_SWPOUT, >>>>>> MTHP_STAT_SWPOUT_FALLBACK, >>>>>> - MTHP_STAT_FILE_ALLOC, >>>>>> - MTHP_STAT_FILE_FALLBACK, >>>>>> - MTHP_STAT_FILE_FALLBACK_CHARGE, >>>>>> + MTHP_STAT_SHMEM_ALLOC, >>>>>> + MTHP_STAT_SHMEM_FALLBACK, >>>>>> + MTHP_STAT_SHMEM_FALLBACK_CHARGE, >>>>>> MTHP_STAT_SPLIT, >>>>>> MTHP_STAT_SPLIT_FAILED, >>>>>> MTHP_STAT_SPLIT_DEFERRED, >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index 9ec64aa2be94..f9696c94e211 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); >>>>>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); >>>>>> DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT); >>>>>> DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK); >>>>>> -DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC); >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK); >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE); >>>>>> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); >>>>>> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); >>>>>> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); >>>>>> @@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = { >>>>>> &anon_fault_fallback_charge_attr.attr, >>>>>> &swpout_attr.attr, >>>>>> &swpout_fallback_attr.attr, >>>>>> - &file_alloc_attr.attr, >>>>>> - &file_fallback_attr.attr, >>>>>> - &file_fallback_charge_attr.attr, >>>>>> + &shmem_alloc_attr.attr, >>>>>> + &shmem_fallback_attr.attr, >>>>>> + &shmem_fallback_charge_attr.attr, >>>>>> &split_attr.attr, >>>>>> &split_failed_attr.attr, >>>>>> &split_deferred_attr.attr, >>>>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>>>> index 921d59c3d669..f24dfbd387ba 100644 >>>>>> --- a/mm/shmem.c >>>>>> +++ b/mm/shmem.c >>>>>> @@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, >>>>>> if (pages == HPAGE_PMD_NR) >>>>>> count_vm_event(THP_FILE_FALLBACK); >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>>> - count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); >>>>>> + count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK); >>>>>> #endif >>>>>> order = next_order(&suitable_orders, order); >>>>>> } >>>>>> @@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, >>>>>> count_vm_event(THP_FILE_FALLBACK_CHARGE); >>>>>> } >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK); >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE); >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK); >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE); >>>>>> #endif >>>>>> } >>>>>> goto unlock; >>>>>> @@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, >>>>>> if (folio_test_pmd_mappable(folio)) >>>>>> count_vm_event(THP_FILE_ALLOC); >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC); >>>>>> #endif >>>>>> goto alloced; >>>>>> } >>>>>> -- >>>>>> 2.43.0 >>>>>> >>>>> > > Thanks > Barry