On Fri, Jan 10, 2025 at 7:54 AM Dev Jain <dev.jain@xxxxxxx> wrote: > > > > On 09/01/25 5:01 am, Nico Pache wrote: > > khugepaged scans PMD ranges for potential collapse to a hugepage. To add > > mTHP support we use this scan to instead record chunks of fully utilized > > sections of the PMD. > > > > create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks. > > by default we will set this to order 3. The reasoning is that for 4K 512 > > PMD size this results in a 64 bit bitmap which has some optimizations. > > For other arches like ARM64 64K, we can set a larger order if needed. > > > > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap > > that represents chunks of fully utilized regions. We can then determine > > what mTHP size fits best and in the following patch, we set this bitmap > > while scanning the PMD. > > > > max_ptes_none is used as a scale to determine how "full" an order must > > be before being considered for collapse. > > > > Signed-off-by: Nico Pache <npache@xxxxxxxxxx> > > --- > > include/linux/khugepaged.h | 4 +- > > mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++-- > > 2 files changed, 126 insertions(+), 7 deletions(-) > > > > [--snip--] > > > > > +// Recursive function to consume the bitmap > > +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address, > > + int referenced, int unmapped, struct collapse_control *cc, > > + bool *mmap_locked, unsigned long enabled_orders) > > +{ > > + u8 order, offset; > > + int num_chunks; > > + int bits_set, max_percent, threshold_bits; > > + int next_order, mid_offset; > > + int top = -1; > > + int collapsed = 0; > > + int ret; > > + struct scan_bit_state state; > > + > > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > > + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 }; > > + > > + while (top >= 0) { > > + state = cc->mthp_bitmap_stack[top--]; > > + order = state.order; > > + offset = state.offset; > > + num_chunks = 1 << order; > > + // Skip mTHP orders that are not enabled > > + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1) > > + goto next; > > + > > + // copy the relavant section to a new bitmap > > + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, > > + MTHP_BITMAP_SIZE); > > + > > + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks); > > + > > + // Check if the region is "almost full" based on the threshold > > + max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100) > > + / (HPAGE_PMD_NR - 1); > > + threshold_bits = (max_percent * num_chunks) / 100; > > + > > + if (bits_set >= threshold_bits) { > > + ret = collapse_huge_page(mm, address, referenced, unmapped, cc, > > + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR); > > + if (ret == SCAN_SUCCEED) > > + collapsed += (1 << (order + MIN_MTHP_ORDER)); > > + continue; > > + } > > We are going to the lower order when it is not in the allowed mask of > orders, or when we are below the threshold. What to do when these > conditions do not happen, and the reason for collapse failure is > collapse_huge_page()? For example, if you start with a PMD order scan, > and collapse_huge_page() fails, then you hit "continue", and then exit > the loop because there is nothing else in the stack, so we exit without > trying mTHPs. Thanks for catching that, I introduced that bug when I went from the recursion to stack based approach. This should only continue on SCAN_SUCCEED. If not it needs to go next: I think I also need to handle the case where nothing succeeds in khugepaged_scan_pmd. > > > + > > +next: > > + if (order > 0) { > > + next_order = order - 1; > > + mid_offset = offset + (num_chunks / 2); > > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > > + { next_order, mid_offset }; > > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) > > + { next_order, offset }; > > + } > > + } > > + return collapsed; > > +} > > + >