Re: [linus:master] [mm] c0bff412e6: stress-ng.clone.ops_per_sec -2.9% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.08.24 08:39, Yin, Fengwei wrote:
Hi David,

On 7/30/2024 4:11 PM, David Hildenbrand wrote:
On 30.07.24 07:00, kernel test robot wrote:


Hello,

kernel test robot noticed a -2.9% regression of
stress-ng.clone.ops_per_sec on:

Is that test even using hugetlb? Anyhow, this pretty much sounds like
noise and can be ignored.

It's not about hugetlb. It looks like related with the change:

Ah, that makes sense!


diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 888353c209c03..7577fe7debafc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1095,7 +1095,12 @@ PAGEFLAG(Isolated, isolated, PF_ANY);
   static __always_inline int PageAnonExclusive(const struct page *page)
   {
          VM_BUG_ON_PGFLAGS(!PageAnon(page), page);
-       VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
+       /*
+        * HugeTLB stores this information on the head page; THP keeps
it per
+        * page
+        */
+       if (PageHuge(page))
+               page = compound_head(page);
          return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags);


The PageAnonExclusive() function is changed. And the profiling data
showed it:

        0.00            +3.9        3.90
perf-profile.calltrace.cycles-pp.folio_try_dup_anon_rmap_ptes.copy_present_ptes.copy_pte_range.copy_p4d_range.copy_page_range

According
https://download.01.org/0day-ci/archive/20240730/202407301049.5051dc19-oliver.sang@xxxxxxxxx/config-6.9.0-rc4-00197-gc0bff412e67b:
	# CONFIG_DEBUG_VM is not set
So maybe such code change could bring difference?

Yes indeed. fork() can be extremely sensitive to each added instruction.

I even pointed out to Peter why I didn't add the PageHuge check in there originally [1].

"Well, and I didn't want to have runtime-hugetlb checks in
PageAnonExclusive code called on certainly-not-hugetlb code paths."


We now have to do a page_folio(page) and then test for hugetlb.

	return folio_test_hugetlb(page_folio(page));

Nowadays, folio_test_hugetlb() will be faster than at c0bff412e6 times, so maybe at least part of the overhead is gone.


[1] https://lore.kernel.org/r/all/8b0b24bb-3c38-4f27-a2c9-f7d7adc4a115@xxxxxxxxxx/


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux