Re: [PATCH v3] mm/filemap: change ->index to PAGE_SIZE for hugetlb pages

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Fri, 11 Aug 2023 16:39:39 -0700

On 08/03/23 12:48, Sidhartha Kumar wrote:
> v2 -> v3
> 	- gather performance data per Mike Kravetz
> 	- remove start variable in remove_inode_hugepages() per Mike Kravetz
> 	- remove hugetlb special case within folio_file_page()
> 
> 
> ========================= PERFORMANCE ======================================
> 
> Perf was used to check the performance differences after the patch. Overall
> the performance is similar to mainline with a very small larger overhead that
> occurs in __filemap_add_folio() and hugetlb_add_to_page_cache(). This is because
> of the larger overhead that occurs in xa_load() and xa_store() as the
> xarray is now using more entriese to store hugetlb folios in the page cache.
> 
> aarch64:
> 	workload - fallocate a 700GB file backed by huge pages 
> 	
> 	6.5-rc3 + this patch:
> 		2MB Page Size:
> 			--100.00%--__arm64_sys_fallocate
>                           ksys_fallocate
>                           vfs_fallocate
>                           hugetlbfs_fallocate
>                           |          
>                           |--95.04%--__pi_clear_page
>                           |          
>                           |--3.57%--clear_huge_page
>                           |          |          
>                           |          |--2.63%--rcu_all_qs
>                           |          |          
>                           |           --0.91%--__cond_resched
>                           |          
>                            --0.67%--__cond_resched
> 			0.17%     0.00%             0  fallocate  [kernel.vmlinux]       [k] hugetlb_add_to_page_cache
>      			0.14%     0.10%            11  fallocate  [kernel.vmlinux]       [k] __filemap_add_folio

Thanks for getting the performance data!
I think someone may have already mentioned that this should be part of
the actual commit message.  And, when moved to the actual commit
message you might not want to include the data where there were no perf
samples in the page cache related code (1GB pages).

Any operation where we add a hugetlb page to the cache is going to be
immediately preceded by clearing the page (as in fallocate or a fault),
or writing to the page (as in userfaultfd).  Therefore, the difference
in page cache handling is going to be mostly in the noise.  This is more
so with larger huge page sizes such as 1G.

> 	6.5-rc3
> 		2MB Page Size:
> 		        --100.00%--__arm64_sys_fallocate
>                           ksys_fallocate
>                           vfs_fallocate
>                           hugetlbfs_fallocate
>                           |          
>                           |--94.91%--__pi_clear_page
>                           |          
>                           |--4.11%--clear_huge_page
>                           |          |          
>                           |          |--3.00%--rcu_all_qs
>                           |          |          
>                           |           --1.10%--__cond_resched
>                           |          
>                            --0.59%--__cond_resched
> 			0.08%     0.01%             1  fallocate  [kernel.kallsyms]  [k] hugetlb_add_to_page_cache
>      			0.05%     0.03%             3  fallocate  [kernel.kallsyms]  [k] __filemap_add_folio
> 
> x86
> 	workload - fallocate a 100GB file backed by huge pages 
> 	
> 	6.5-rc3 + this patch:
> 		2MB Page Size:
> 			0.04%     0.04%             1  fallocate  [kernel.kallsyms]     [k] xa_load
> 			0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] hugetlb_add_to_page_cache
> 			0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] __filemap_add_folio
> 			0.04%     0.00%             0  fallocate  [kernel.kallsyms]     [k] xas_store
> 		
> 	6.5-rc3
> 		2MB Page Size:
> 			0.03%     0.03%             1  fallocate  [kernel.kallsyms]  [k] __filemap_add_folio

What would be helpful is to include the real (user perceived) times with
and without your changes.  I expect these to be essentially the same.
What I want to avoid is introducing any user perceived slowdowns when
doing something like adding 1TB of hugetlb pages to the cache.  I am
aware that this may be done as part of an application (DB) startup.

> rebased on mm-unstable 08/02/23
> 
> 
> [1]: https://lore.kernel.org/linux-mm/20230519220142.212051-1-sidhartha.kumar@xxxxxxxxxx/T/
> [2]: https://lore.kernel.org/lkml/20230609194947.37196-1-sidhartha.kumar@xxxxxxxxxx/
> [3]: https://lore.kernel.org/lkml/ZLtVlJA+V2+2yjxc@xxxxxxxxxxxxxxxxxxxx/T/
> 
>  fs/hugetlbfs/inode.c    | 15 ++++++++-------
>  include/linux/hugetlb.h | 12 ++++++++++++
>  include/linux/pagemap.h | 29 ++---------------------------
>  mm/filemap.c            | 36 +++++++++++-------------------------
>  mm/hugetlb.c            | 11 ++++++-----
>  5 files changed, 39 insertions(+), 64 deletions(-)

There has been some code churn since the rebase, so you will need to rebase
again.  Besides that, the code looks good to me.
-- 
Mike Kravetz