Re: [PATCH v1] mm: shmem: Rename mTHP shmem counters

Barry Song <21cnbao@xxxxxxxxx> · Tue, 9 Jul 2024 20:40:33 +1200

On Tue, Jul 9, 2024 at 8:35 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> On 09/07/2024 09:13, Barry Song wrote:
> > On Tue, Jul 9, 2024 at 7:55 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >>
> >> On 09/07/2024 02:44, Barry Song wrote:
> >>> On Tue, Jul 9, 2024 at 12:30 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >>>>
> >>>> On 08/07/2024 12:36, Barry Song wrote:
> >>>>> On Mon, Jul 8, 2024 at 11:24 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >>>>>>
> >>>>>> The legacy PMD-sized THP counters at /proc/vmstat include
> >>>>>> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge, which
> >>>>>> rather confusingly refer to shmem THP and do not include any other types
> >>>>>> of file pages. This is inconsistent since in most other places in the
> >>>>>> kernel, THP counters are explicitly separated for anon, shmem and file
> >>>>>> flavours. However, we are stuck with it since it constitutes a user ABI.
> >>>>>>
> >>>>>> Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for
> >>>>>> anonymous shmem") added equivalent mTHP stats for shmem, keeping the
> >>>>>> same "file_" prefix in the names. But in future, we may want to add
> >>>>>> extra stats to cover actual file pages, at which point, it would all
> >>>>>> become very confusing.
> >>>>>>
> >>>>>> So let's take the opportunity to rename these new counters "shmem_"
> >>>>>> before the change makes it upstream and the ABI becomes immutable.
> >>>>>
> >>>>> Personally, I think this approach is much clearer. However, I recall
> >>>>> we discussed this
> >>>>> before [1], and it seems that inconsistency is a concern?
> >>>>
> >>>> Embarrassingly, I don't recall that converstation at all :-| but at least what I
> >>>> said then is consistent with what I've done in this patch.
> >>>>
> >>>> I think David's conclusion from that thread was to call them FILE_, and add both
> >>>> shmem and pagecache counts to those counters, meaning we can keep the same name
> >>>> as legacy THP counters. But those legacy THP counters only count shmem, and I
> >>>> don't think we would get away with adding pagecache counts to those at this
> >>>> point? (argument: they have been around for long time and there is a risk that
> >>>> user space relies on them and if they were to dramatically increase due to
> >>>> pagecache addition now that could break things). In that case, there is still
> >>>> inconsistency, but its worse; the names are consistent but the semantics are
> >>>> inconsistent.
> >>>>
> >>>> So my vote is to change to SHMEM_ as per this patch :)
> >>>
> >>> I have no objections. However, I dislike the documentation for
> >>> thp_file_*. Perhaps we can clean it all up together ?
> >>
> >> I agree that we should clean this documentation up and I'm happy to roll it into
> >> v2. However, I don't think what you have suggested is quite right.
> >>
> >> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge *only* count
> >> shmem. They don't count pagecache. So perhaps the change should be "...every
> >> time a shmem huge page (dispite being named after "file", the counter measures
> >> only shmem) is..."?
> >
> > I understand what you are saying, and I know that thp_file_* has only
> > included shmem so far. My question is whether it will include regular
> > files in the future? If not, I am perfectly fine with your approach.
>
> My whole reasoning for this patch is based on the assertion that since
> THP_FILE_ALLOC has been there for 8 years and in all that time has only counted
> shmem, then its highly likely that someone is depending on that semantic and we
> can't change it. I don't have any actual evidence of code that relies on it though.
>
> I propose I change the docs to reflect what's actually happening today (i.e.
> shmem *only*). If we later decide we want to also report page cache numbers
> through that same counter, then we can change the docs at that point. But if I
> get my way, we'll soon have mTHP counters for FILE, which is solely for page
> cache. So You'll be able to get all the fine-grained info out of those and there
> will be no need to mess with the legacy counters.

Make sense to me. I'd rather we go to

/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/file_*
/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/shmem_*

if we later have 2MiB file counters.

>
> >
> > READ_ONLY_THP_FOR_FS isn't applicable in this path as it is created
> > by khugepaged collapse.
> >
> >>
> >> thp_file_mapped includes both file and shmem, so agree with your change there.
> >>
> >> What do you think?
> >>
> >>
> >>>
> >>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>> index 709fe10b60f4..65df48cb3bbb 100644
> >>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>> @@ -417,21 +417,22 @@ thp_collapse_alloc_failed
> >>>       the allocation.
> >>>
> >>>  thp_file_alloc
> >>> -     is incremented every time a file huge page is successfully
> >>> -     allocated.
> >>> +     is incremented every time a file (including shmem) huge page is
> >>> +     successfully allocated.
> >>>
> >>>  thp_file_fallback
> >>> -     is incremented if a file huge page is attempted to be allocated
> >>> -     but fails and instead falls back to using small pages.
> >>> +     is incremented if a file (including shmem) huge page is attempted
> >>> +     to be allocated but fails and instead falls back to using small
> >>> +     pages.
> >>>
> >>>  thp_file_fallback_charge
> >>> -     is incremented if a file huge page cannot be charged and instead
> >>> -     falls back to using small pages even though the allocation was
> >>> -     successful.
> >>> +     is incremented if a file (including shmem) huge page cannot be
> >>> +     charged and instead falls back to using small pages even though
> >>> +     the allocation was successful.
> >>>
> >>>  thp_file_mapped
> >>> -     is incremented every time a file huge page is mapped into
> >>> -     user address space.
> >>> +     is incremented every time a file (including shmem) huge page is
> >>> +     mapped into user address space.
> >>>
> >>>  thp_split_page
> >>>       is incremented every time a huge page is split into base
> >>>
> >>>>
> >>>>>
> >>>>> [1] https://lore.kernel.org/linux-mm/05d0096e4ec3e572d1d52d33a31a661321ac1551.1713755580.git.baolin.wang@xxxxxxxxxxxxxxxxx/
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> >>>>>> ---
> >>>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Applies on top of today's mm-unstable (2073cda629a4) and tested with mm
> >>>>>> selftests; no regressions observed.
> >>>>>>
> >>>>>> The backstory here is that I'd like to introduce some counters for regular file
> >>>>>> folio allocations to observe how often large folio allocation succeeds, but
> >>>>>> these shmem counters are named "file" which is going to make things confusing.
> >>>>>> So hoping to solve that before commit 66f44583f9b6 ("mm: shmem: add mTHP
> >>>>>> counters for anonymous shmem") goes upstream (it is currently in mm-stable).
> >>>>>>
> >>>>>> Admittedly, this change means the mTHP stat names are not the same as the legacy
> >>>>>> PMD-size THP names, but I think that's a smaller issue than having "file_" mTHP
> >>>>>> stats that only count shmem, then having to introduce "file2_" or "pgcache_"
> >>>>>> stats for the regular file memory, which is even more inconsistent IMHO. I guess
> >>>>>> the alternative is to count both shmem and file in these mTHP stats (that's how
> >>>>>> they were documented anyway) but I think it's better to be able to consider them
> >>>>>> separately like we do for all the other counters.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ryan
> >>>>>>
> >>>>>>  Documentation/admin-guide/mm/transhuge.rst | 12 ++++++------
> >>>>>>  include/linux/huge_mm.h                    |  6 +++---
> >>>>>>  mm/huge_memory.c                           | 12 ++++++------
> >>>>>>  mm/shmem.c                                 |  8 ++++----
> >>>>>>  4 files changed, 19 insertions(+), 19 deletions(-)
> >>>>>>
> >>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> index 747c811ee8f1..8b891689fc13 100644
> >>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>>>> @@ -496,16 +496,16 @@ swpout_fallback
> >>>>>>         Usually because failed to allocate some continuous swap space
> >>>>>>         for the huge page.
> >>>>>>
> >>>>>> -file_alloc
> >>>>>> -       is incremented every time a file huge page is successfully
> >>>>>> +shmem_alloc
> >>>>>> +       is incremented every time a shmem huge page is successfully
> >>>>>>         allocated.
> >>>>>>
> >>>>>> -file_fallback
> >>>>>> -       is incremented if a file huge page is attempted to be allocated
> >>>>>> +shmem_fallback
> >>>>>> +       is incremented if a shmem huge page is attempted to be allocated
> >>>>>>         but fails and instead falls back to using small pages.
> >>>>>>
> >>>>>> -file_fallback_charge
> >>>>>> -       is incremented if a file huge page cannot be charged and instead
> >>>>>> +shmem_fallback_charge
> >>>>>> +       is incremented if a shmem huge page cannot be charged and instead
> >>>>>>         falls back to using small pages even though the allocation was
> >>>>>>         successful.
> >>>>>>
> >>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>>>>> index acb6ac24a07e..cff002be83eb 100644
> >>>>>> --- a/include/linux/huge_mm.h
> >>>>>> +++ b/include/linux/huge_mm.h
> >>>>>> @@ -269,9 +269,9 @@ enum mthp_stat_item {
> >>>>>>         MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> >>>>>>         MTHP_STAT_SWPOUT,
> >>>>>>         MTHP_STAT_SWPOUT_FALLBACK,
> >>>>>> -       MTHP_STAT_FILE_ALLOC,
> >>>>>> -       MTHP_STAT_FILE_FALLBACK,
> >>>>>> -       MTHP_STAT_FILE_FALLBACK_CHARGE,
> >>>>>> +       MTHP_STAT_SHMEM_ALLOC,
> >>>>>> +       MTHP_STAT_SHMEM_FALLBACK,
> >>>>>> +       MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> >>>>>>         MTHP_STAT_SPLIT,
> >>>>>>         MTHP_STAT_SPLIT_FAILED,
> >>>>>>         MTHP_STAT_SPLIT_DEFERRED,
> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>>>> index 9ec64aa2be94..f9696c94e211 100644
> >>>>>> --- a/mm/huge_memory.c
> >>>>>> +++ b/mm/huge_memory.c
> >>>>>> @@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
> >>>>>> -DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
> >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
> >>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
> >>>>>>  DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
> >>>>>> @@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
> >>>>>>         &anon_fault_fallback_charge_attr.attr,
> >>>>>>         &swpout_attr.attr,
> >>>>>>         &swpout_fallback_attr.attr,
> >>>>>> -       &file_alloc_attr.attr,
> >>>>>> -       &file_fallback_attr.attr,
> >>>>>> -       &file_fallback_charge_attr.attr,
> >>>>>> +       &shmem_alloc_attr.attr,
> >>>>>> +       &shmem_fallback_attr.attr,
> >>>>>> +       &shmem_fallback_charge_attr.attr,
> >>>>>>         &split_attr.attr,
> >>>>>>         &split_failed_attr.attr,
> >>>>>>         &split_deferred_attr.attr,
> >>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
> >>>>>> index 921d59c3d669..f24dfbd387ba 100644
> >>>>>> --- a/mm/shmem.c
> >>>>>> +++ b/mm/shmem.c
> >>>>>> @@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
> >>>>>>                         if (pages == HPAGE_PMD_NR)
> >>>>>>                                 count_vm_event(THP_FILE_FALLBACK);
> >>>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>>>>> -                       count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> >>>>>> +                       count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
> >>>>>>  #endif
> >>>>>>                         order = next_order(&suitable_orders, order);
> >>>>>>                 }
> >>>>>> @@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
> >>>>>>                                 count_vm_event(THP_FILE_FALLBACK_CHARGE);
> >>>>>>                         }
> >>>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>>>>> -                       count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
> >>>>>> -                       count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>>>>> +                       count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
> >>>>>> +                       count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
> >>>>>>  #endif
> >>>>>>                 }
> >>>>>>                 goto unlock;
> >>>>>> @@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >>>>>>                         if (folio_test_pmd_mappable(folio))
> >>>>>>                                 count_vm_event(THP_FILE_ALLOC);
> >>>>>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>>>>> -                       count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> >>>>>> +                       count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
> >>>>>>  #endif
> >>>>>>                         goto alloced;
> >>>>>>                 }
> >>>>>> --
> >>>>>> 2.43.0
> >>>>>>
> >>>>>
> >

Thanks
Barry