Re: [PATCH v2] mm: add per-order mTHP alloc_success and alloc_fail counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/04/2024 09:24, David Hildenbrand wrote:
> On 03.04.24 10:18, Ryan Roberts wrote:
>> On 02/04/2024 22:29, Barry Song wrote:
>>> On Wed, Apr 3, 2024 at 7:46 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>>>>
>>>> On 28.03.24 10:51, Barry Song wrote:
>>>>> From: Barry Song <v-songbaohua@xxxxxxxx>
>>>>>
>>>>> Profiling a system blindly with mTHP has become challenging due
>>>>> to the lack of visibility into its operations. Presenting the
>>>>> success rate of mTHP allocations appears to be pressing need.
>>>>>
>>>>> Recently, I've been experiencing significant difficulty debugging
>>>>> performance improvements and regressions without these figures.
>>>>> It's crucial for us to understand the true effectiveness of
>>>>> mTHP in real-world scenarios, especially in systems with
>>>>> fragmented memory.
>>>>>
>>>>> This patch sets up the framework for per-order mTHP counters,
>>>>> starting with the introduction of alloc_success and alloc_fail
>>>>> counters.  Incorporating additional counters should now be
>>>>> straightforward as well.
>>>>>
>>>>> The initial two unsigned longs for each event are unused, given
>>>>> that order-0 and order-1 are not mTHP. Nonetheless, this refinement
>>>>> improves code clarity.
>>>>>
>>>>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
>>>>> ---
>>>>>    -v2:
>>>>>    * move to sysfs and provide per-order counters; David, Ryan, Willy
>>>>>    -v1:
>>>>>    https://lore.kernel.org/linux-mm/20240326030103.50678-1-21cnbao@xxxxxxxxx/
>>>>>
>>>>>    include/linux/huge_mm.h | 17 +++++++++++++
>>>>>    mm/huge_memory.c        | 54 +++++++++++++++++++++++++++++++++++++++++
>>>>>    mm/memory.c             |  3 +++
>>>>>    3 files changed, 74 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index e896ca4760f6..27fa26a22a8f 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -264,6 +264,23 @@ unsigned long thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>>                                          enforce_sysfs, orders);
>>>>>    }
>>>>>
>>>>> +enum thp_event_item {
>>>>> +     THP_ALLOC_SUCCESS,
>>>>> +     THP_ALLOC_FAIL,
>>>>> +     NR_THP_EVENT_ITEMS
>>>>> +};
>>>>
>>>> I'm wondering if these should be ANON specific for now. We might want to
>>>> add others (shmem, file) in the future.
>>>
>>> I've two ways to do that
>>> 1. rename to ANON_THP_ALLOC, so that I can have SHMEM_THP_ALLOC, FILE_THP_ALLOC
>>> in the future;
>>> 2. let THP_ALLOC cover all of shmem, file and anon.
>>>
>>> following vmstat, actually 1 might be better as we have both THP_FAULT_ALLOC and
>>> THP_FILE_ALLOC for pmd-mapped THP.
>>>
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>                  THP_FAULT_ALLOC,
>>>                  THP_FAULT_FALLBACK,
>>>                  THP_FAULT_FALLBACK_CHARGE,
>>>                  THP_COLLAPSE_ALLOC,
>>>                  THP_COLLAPSE_ALLOC_FAILED,
>>>                  THP_FILE_ALLOC,
>>>                  THP_FILE_FALLBACK,
>>>                  THP_FILE_FALLBACK_CHARGE,
>>>                  THP_FILE_MAPPED,
>>>                  THP_SPLIT_PAGE,
>>>                  THP_SPLIT_PAGE_FAILED,
>>>                  THP_DEFERRED_SPLIT_PAGE,
>>>                  THP_SPLIT_PMD,
>>>                  THP_SCAN_EXCEED_NONE_PTE,
>>>                  THP_SCAN_EXCEED_SWAP_PTE,
>>>                  THP_SCAN_EXCEED_SHARED_PTE,
>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>                  THP_SPLIT_PUD,
>>> #endif
>>>                  THP_ZERO_PAGE_ALLOC,
>>>                  THP_ZERO_PAGE_ALLOC_FAILED,
>>>                  THP_SWPOUT,
>>>                  THP_SWPOUT_FALLBACK,
>>> #endif
>>>
>>> And reading mm/shmem.c, obviously, shmem is using THP_FILE_ALLOC.
>>>
>>> I will rename it to ANON_THP_ALLOC in v3, let me know if you disagree :-)
>>
>> I don't think the name of the enum is important - its an implementation detail
>> that can be changed. Its the name of the sysfs file that matters. Although of
>> course its nice to keep them in sync from a maintenance pov.
> 
> Jup.
> 
>>
>> Currently they are called "alloc_success" and "alloc_fail" I believe? Perhaps
>> "anon_alloc" and "anon_alloc_fallback" are a bit more in keeping with vmstat?
>>
>> I'm assuming that:
>>
>> vmstat:thp_fault_alloc == hugepages-2048kB/stats/anon_alloc
>> vmstat:thp_fault_alloc_fallback == hugepages-2048kB/stats/anon_alloc_fallback
> 
> Or an "anon" subdirectory ... not sure, just a thought.

I have no strong opinion. I'm thinking about how to easily display all the information though:

$ for f in /sys/kernel/mm/hugepages/hugepages-2048kB/*; do printf '%s: ' "$f";  cat "$f"; done
/sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages: 0
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages: 0
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages_mempolicy: 0
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_overcommit_hugepages: 0
/sys/kernel/mm/hugepages/hugepages-2048kB/resv_hugepages: 0
/sys/kernel/mm/hugepages/hugepages-2048kB/surplus_hugepages: 0

$ for f in /sys/kernel/mm/hugepages/hugepages-2048kB/*; do printf '%s: ' `basename "$f"`;  cat "$f"; done
free_hugepages: 0
nr_hugepages: 0
nr_hugepages_mempolicy: 0
nr_overcommit_hugepages: 0
resv_hugepages: 0
surplus_hugepages: 0

It looks cleaner to me if we have all the info in the filename so we don't have to display the whole directory hierachy. 

But I'm sure someone more competant with bash will tell me exactly how to do it with fewer chars and an even nicer display...




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux