On Tue, Dec 13, 2022 at 11:38 PM Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx> wrote: > > On 12/13/22 5:02 PM, Mike Kravetz wrote: > > On 12/13/22 17:27, Nico Pache wrote: > >> According to the document linked the following approach is even faster > >> than the one I used due to CPU parallelization: > > > > I do not think we are very concerned with speed here. This routine is being > > called in the creation of compound pages, and in the case of hugetlb the > > tear down of gigantic pages. In general, creation and tear down of gigantic > > pages happens infrequently. Usually only at system/application startup and > > system/application shutdown. > > > Hi Nico, > > I wrote a bpftrace script to track the time spent in > __prep_compound_gigantic_folio both with and without the branch in > folio_set_order() and resulting histogram was the same for both > versions. This is probably because the for loop through every base page > has a much higher overhead than the singular call to folio_set_order(). > I am not sure what the performance difference for THP would be. Hi Sidhartha, Ok great! We may want to proactively implement a branchless version so once/if THP comes around to utilizing this we won't see a regression. Furthermore, Waiman brought up a good point off the list: This bitmath is needlessly complex and can be achieved with page[1].compound_nr = (1U << order) & ~1U; Tested: order 0 output : 0 order 1 output : 2 order 2 output : 4 order 3 output : 8 order 4 output : 16 order 5 output : 32 order 6 output : 64 order 7 output : 128 order 8 output : 256 order 9 output : 512 order 10 output : 1024 > Below is the script. > Thanks, > Sidhartha Kumar Thanks for the script!! Cheers, -- Nico > k:__prep_compound_gigantic_folio > { > @prep_start[pid] = nsecs; > } > > kr:__prep_compound_gigantic_folio > { > @prep_nsecs = hist((nsecs - @prep_start[pid])); > delete(@prep_start[pid]); > }