According to the document linked the following approach is even faster than the one I used due to CPU parallelization: page[1].compound_nr = ( shift & ~shift) | (-order & shift); for(int x =0; x< 11;x++){ unsigned int order = x; unsigned long shift = 1U << order; printf("order %d output : %lu\n", order, ( shift & ~shift) | (-order & shift)); } order 0 output : 0 order 1 output : 2 order 2 output : 4 order 3 output : 8 order 4 output : 16 order 5 output : 32 order 6 output : 64 order 7 output : 128 order 8 output : 256 -- Nico On Tue, Dec 13, 2022 at 4:53 PM Nico Pache <npache@xxxxxxxxxx> wrote: > > Hi Mike, > > Thanks for the pointer! Would the branchless conditional be an > improvement over the current approach? I'm not sure how hot this path > is, but it may be worth the optimization. > > -- Nico > > On Tue, Dec 13, 2022 at 4:48 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 12/13/22 16:45, Nico Pache wrote: > > > Since commit 1378a5ee451a ("mm: store compound_nr as well as > > > compound_order") the page[1].compound_nr must be explicitly set to 0 if > > > calling set_compound_order(page, 0). > > > > > > This can lead to bugs if the caller of set_compound_order(page, 0) forgets > > > to explicitly set compound_nr=0. An example of this is commit ba9c1201beaa > > > ("mm/hugetlb: clear compound_nr before freeing gigantic pages") > > > > There has been some recent work in this area. The latest patch being, > > https://lore.kernel.org/linux-mm/20221213212053.106058-1-sidhartha.kumar@xxxxxxxxxx/ > > > > -- > > Mike Kravetz > >