On Tue, Jul 9, 2024 at 8:35 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 09/07/2024 09:13, Barry Song wrote: > > On Tue, Jul 9, 2024 at 7:55 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> > >> On 09/07/2024 02:44, Barry Song wrote: > >>> On Tue, Jul 9, 2024 at 12:30 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >>>> > >>>> On 08/07/2024 12:36, Barry Song wrote: > >>>>> On Mon, Jul 8, 2024 at 11:24 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >>>>>> > >>>>>> The legacy PMD-sized THP counters at /proc/vmstat include > >>>>>> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge, which > >>>>>> rather confusingly refer to shmem THP and do not include any other types > >>>>>> of file pages. This is inconsistent since in most other places in the > >>>>>> kernel, THP counters are explicitly separated for anon, shmem and file > >>>>>> flavours. However, we are stuck with it since it constitutes a user ABI. > >>>>>> > >>>>>> Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for > >>>>>> anonymous shmem") added equivalent mTHP stats for shmem, keeping the > >>>>>> same "file_" prefix in the names. But in future, we may want to add > >>>>>> extra stats to cover actual file pages, at which point, it would all > >>>>>> become very confusing. > >>>>>> > >>>>>> So let's take the opportunity to rename these new counters "shmem_" > >>>>>> before the change makes it upstream and the ABI becomes immutable. > >>>>> > >>>>> Personally, I think this approach is much clearer. However, I recall > >>>>> we discussed this > >>>>> before [1], and it seems that inconsistency is a concern? > >>>> > >>>> Embarrassingly, I don't recall that converstation at all :-| but at least what I > >>>> said then is consistent with what I've done in this patch. > >>>> > >>>> I think David's conclusion from that thread was to call them FILE_, and add both > >>>> shmem and pagecache counts to those counters, meaning we can keep the same name > >>>> as legacy THP counters. But those legacy THP counters only count shmem, and I > >>>> don't think we would get away with adding pagecache counts to those at this > >>>> point? (argument: they have been around for long time and there is a risk that > >>>> user space relies on them and if they were to dramatically increase due to > >>>> pagecache addition now that could break things). In that case, there is still > >>>> inconsistency, but its worse; the names are consistent but the semantics are > >>>> inconsistent. > >>>> > >>>> So my vote is to change to SHMEM_ as per this patch :) > >>> > >>> I have no objections. However, I dislike the documentation for > >>> thp_file_*. Perhaps we can clean it all up together ? > >> > >> I agree that we should clean this documentation up and I'm happy to roll it into > >> v2. However, I don't think what you have suggested is quite right. > >> > >> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge *only* count > >> shmem. They don't count pagecache. So perhaps the change should be "...every > >> time a shmem huge page (dispite being named after "file", the counter measures > >> only shmem) is..."? > > > > I understand what you are saying, and I know that thp_file_* has only > > included shmem so far. My question is whether it will include regular > > files in the future? If not, I am perfectly fine with your approach. > > My whole reasoning for this patch is based on the assertion that since > THP_FILE_ALLOC has been there for 8 years and in all that time has only counted > shmem, then its highly likely that someone is depending on that semantic and we > can't change it. I don't have any actual evidence of code that relies on it though. > > I propose I change the docs to reflect what's actually happening today (i.e. > shmem *only*). If we later decide we want to also report page cache numbers > through that same counter, then we can change the docs at that point. But if I > get my way, we'll soon have mTHP counters for FILE, which is solely for page > cache. So You'll be able to get all the fine-grained info out of those and there > will be no need to mess with the legacy counters. Make sense to me. I'd rather we go to /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/file_* /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/shmem_* if we later have 2MiB file counters. > > > > > READ_ONLY_THP_FOR_FS isn't applicable in this path as it is created > > by khugepaged collapse. > > > >> > >> thp_file_mapped includes both file and shmem, so agree with your change there. > >> > >> What do you think? > >> > >> > >>> > >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >>> index 709fe10b60f4..65df48cb3bbb 100644 > >>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>> @@ -417,21 +417,22 @@ thp_collapse_alloc_failed > >>> the allocation. > >>> > >>> thp_file_alloc > >>> - is incremented every time a file huge page is successfully > >>> - allocated. > >>> + is incremented every time a file (including shmem) huge page is > >>> + successfully allocated. > >>> > >>> thp_file_fallback > >>> - is incremented if a file huge page is attempted to be allocated > >>> - but fails and instead falls back to using small pages. > >>> + is incremented if a file (including shmem) huge page is attempted > >>> + to be allocated but fails and instead falls back to using small > >>> + pages. > >>> > >>> thp_file_fallback_charge > >>> - is incremented if a file huge page cannot be charged and instead > >>> - falls back to using small pages even though the allocation was > >>> - successful. > >>> + is incremented if a file (including shmem) huge page cannot be > >>> + charged and instead falls back to using small pages even though > >>> + the allocation was successful. > >>> > >>> thp_file_mapped > >>> - is incremented every time a file huge page is mapped into > >>> - user address space. > >>> + is incremented every time a file (including shmem) huge page is > >>> + mapped into user address space. > >>> > >>> thp_split_page > >>> is incremented every time a huge page is split into base > >>> > >>>> > >>>>> > >>>>> [1] https://lore.kernel.org/linux-mm/05d0096e4ec3e572d1d52d33a31a661321ac1551.1713755580.git.baolin.wang@xxxxxxxxxxxxxxxxx/ > >>>>> > >>>>> > >>>>>> > >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > >>>>>> --- > >>>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> Applies on top of today's mm-unstable (2073cda629a4) and tested with mm > >>>>>> selftests; no regressions observed. > >>>>>> > >>>>>> The backstory here is that I'd like to introduce some counters for regular file > >>>>>> folio allocations to observe how often large folio allocation succeeds, but > >>>>>> these shmem counters are named "file" which is going to make things confusing. > >>>>>> So hoping to solve that before commit 66f44583f9b6 ("mm: shmem: add mTHP > >>>>>> counters for anonymous shmem") goes upstream (it is currently in mm-stable). > >>>>>> > >>>>>> Admittedly, this change means the mTHP stat names are not the same as the legacy > >>>>>> PMD-size THP names, but I think that's a smaller issue than having "file_" mTHP > >>>>>> stats that only count shmem, then having to introduce "file2_" or "pgcache_" > >>>>>> stats for the regular file memory, which is even more inconsistent IMHO. I guess > >>>>>> the alternative is to count both shmem and file in these mTHP stats (that's how > >>>>>> they were documented anyway) but I think it's better to be able to consider them > >>>>>> separately like we do for all the other counters. > >>>>>> > >>>>>> Thanks, > >>>>>> Ryan > >>>>>> > >>>>>> Documentation/admin-guide/mm/transhuge.rst | 12 ++++++------ > >>>>>> include/linux/huge_mm.h | 6 +++--- > >>>>>> mm/huge_memory.c | 12 ++++++------ > >>>>>> mm/shmem.c | 8 ++++---- > >>>>>> 4 files changed, 19 insertions(+), 19 deletions(-) > >>>>>> > >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >>>>>> index 747c811ee8f1..8b891689fc13 100644 > >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>>>> @@ -496,16 +496,16 @@ swpout_fallback > >>>>>> Usually because failed to allocate some continuous swap space > >>>>>> for the huge page. > >>>>>> > >>>>>> -file_alloc > >>>>>> - is incremented every time a file huge page is successfully > >>>>>> +shmem_alloc > >>>>>> + is incremented every time a shmem huge page is successfully > >>>>>> allocated. > >>>>>> > >>>>>> -file_fallback > >>>>>> - is incremented if a file huge page is attempted to be allocated > >>>>>> +shmem_fallback > >>>>>> + is incremented if a shmem huge page is attempted to be allocated > >>>>>> but fails and instead falls back to using small pages. > >>>>>> > >>>>>> -file_fallback_charge > >>>>>> - is incremented if a file huge page cannot be charged and instead > >>>>>> +shmem_fallback_charge > >>>>>> + is incremented if a shmem huge page cannot be charged and instead > >>>>>> falls back to using small pages even though the allocation was > >>>>>> successful. > >>>>>> > >>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>>>> index acb6ac24a07e..cff002be83eb 100644 > >>>>>> --- a/include/linux/huge_mm.h > >>>>>> +++ b/include/linux/huge_mm.h > >>>>>> @@ -269,9 +269,9 @@ enum mthp_stat_item { > >>>>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, > >>>>>> MTHP_STAT_SWPOUT, > >>>>>> MTHP_STAT_SWPOUT_FALLBACK, > >>>>>> - MTHP_STAT_FILE_ALLOC, > >>>>>> - MTHP_STAT_FILE_FALLBACK, > >>>>>> - MTHP_STAT_FILE_FALLBACK_CHARGE, > >>>>>> + MTHP_STAT_SHMEM_ALLOC, > >>>>>> + MTHP_STAT_SHMEM_FALLBACK, > >>>>>> + MTHP_STAT_SHMEM_FALLBACK_CHARGE, > >>>>>> MTHP_STAT_SPLIT, > >>>>>> MTHP_STAT_SPLIT_FAILED, > >>>>>> MTHP_STAT_SPLIT_DEFERRED, > >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>>>> index 9ec64aa2be94..f9696c94e211 100644 > >>>>>> --- a/mm/huge_memory.c > >>>>>> +++ b/mm/huge_memory.c > >>>>>> @@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); > >>>>>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); > >>>>>> DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT); > >>>>>> DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK); > >>>>>> -DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); > >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC); > >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK); > >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE); > >>>>>> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT); > >>>>>> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); > >>>>>> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); > >>>>>> @@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = { > >>>>>> &anon_fault_fallback_charge_attr.attr, > >>>>>> &swpout_attr.attr, > >>>>>> &swpout_fallback_attr.attr, > >>>>>> - &file_alloc_attr.attr, > >>>>>> - &file_fallback_attr.attr, > >>>>>> - &file_fallback_charge_attr.attr, > >>>>>> + &shmem_alloc_attr.attr, > >>>>>> + &shmem_fallback_attr.attr, > >>>>>> + &shmem_fallback_charge_attr.attr, > >>>>>> &split_attr.attr, > >>>>>> &split_failed_attr.attr, > >>>>>> &split_deferred_attr.attr, > >>>>>> diff --git a/mm/shmem.c b/mm/shmem.c > >>>>>> index 921d59c3d669..f24dfbd387ba 100644 > >>>>>> --- a/mm/shmem.c > >>>>>> +++ b/mm/shmem.c > >>>>>> @@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, > >>>>>> if (pages == HPAGE_PMD_NR) > >>>>>> count_vm_event(THP_FILE_FALLBACK); > >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>>>>> - count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > >>>>>> + count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK); > >>>>>> #endif > >>>>>> order = next_order(&suitable_orders, order); > >>>>>> } > >>>>>> @@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, > >>>>>> count_vm_event(THP_FILE_FALLBACK_CHARGE); > >>>>>> } > >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK); > >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE); > >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK); > >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE); > >>>>>> #endif > >>>>>> } > >>>>>> goto unlock; > >>>>>> @@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > >>>>>> if (folio_test_pmd_mappable(folio)) > >>>>>> count_vm_event(THP_FILE_ALLOC); > >>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > >>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC); > >>>>>> #endif > >>>>>> goto alloced; > >>>>>> } > >>>>>> -- > >>>>>> 2.43.0 > >>>>>> > >>>>> > > Thanks Barry