On Thu, Aug 1, 2024 at 3:34 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 01.08.24 15:30, Mateusz Guzik wrote: > > On Thu, Aug 01, 2024 at 08:49:27AM +0200, David Hildenbrand wrote: > >> Yes indeed. fork() can be extremely sensitive to each added instruction. > >> > >> I even pointed out to Peter why I didn't add the PageHuge check in there > >> originally [1]. > >> > >> "Well, and I didn't want to have runtime-hugetlb checks in > >> PageAnonExclusive code called on certainly-not-hugetlb code paths." > >> > >> > >> We now have to do a page_folio(page) and then test for hugetlb. > >> > >> return folio_test_hugetlb(page_folio(page)); > >> > >> Nowadays, folio_test_hugetlb() will be faster than at c0bff412e6 times, so > >> maybe at least part of the overhead is gone. > >> > > > > I'll note page_folio expands to a call to _compound_head. > > > > While _compound_head is declared as an inline, it ends up being big > > enough that the compiler decides to emit a real function instead and > > real func calls are not particularly cheap. > > > > I had a brief look with a profiler myself and for single-threaded usage > > the func is quite high up there, while it manages to get out with the > > first branch -- that is to say there is definitely performance lost for > > having a func call instead of an inlined branch. > > > > The routine is deinlined because of a call to page_fixed_fake_head, > > which itself is annotated with always_inline. > > > > This is of course patchable with minor shoveling. > > > > I did not go for it because stress-ng results were too unstable for me > > to confidently state win/loss. > > > > But should you want to whack the regression, this is what I would look > > into. > > > > This might improve it, at least for small folios I guess: > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 5769fe6e4950..7796ae116018 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -1086,7 +1086,7 @@ PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) > */ > static inline bool PageHuge(const struct page *page) > { > - return folio_test_hugetlb(page_folio(page)); > + return PageCompound(page) && folio_test_hugetlb(page_folio(page)); > } > > /* > > > We would avoid the function call for small folios. > why not massage _compound_head back to an inlineable form instead? for all i know you may even register a small win in total -- Mateusz Guzik <mjguzik gmail.com>