On 12/13/22 5:02 PM, Mike Kravetz wrote:
On 12/13/22 17:27, Nico Pache wrote:
According to the document linked the following approach is even faster
than the one I used due to CPU parallelization:
I do not think we are very concerned with speed here. This routine is being
called in the creation of compound pages, and in the case of hugetlb the
tear down of gigantic pages. In general, creation and tear down of gigantic
pages happens infrequently. Usually only at system/application startup and
system/application shutdown.
Hi Nico,
I wrote a bpftrace script to track the time spent in
__prep_compound_gigantic_folio both with and without the branch in
folio_set_order() and resulting histogram was the same for both
versions. This is probably because the for loop through every base page
has a much higher overhead than the singular call to folio_set_order().
I am not sure what the performance difference for THP would be.
@prep_nsecs:
[1M, 2M)
50|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
Below is the script.
Thanks,
Sidhartha Kumar
k:__prep_compound_gigantic_folio
{
@prep_start[pid] = nsecs;
}
kr:__prep_compound_gigantic_folio
{
@prep_nsecs = hist((nsecs - @prep_start[pid]));
delete(@prep_start[pid]);
}
I think the only case where we 'might' be concerned with speed is in the
creation of compound pages for THP. Do note that this code path is
still using set_compound_order as it has not been converted to folios.