On 16/12/2024 16:51, Dev Jain wrote: > One of the testcases triggers a CoW on the 255th page (0-indexing) with > max_ptes_shared = 256. This leads to 0-254 pages (255 in number) being unshared, > and 257 pages shared, exceeding the constraint. Suppose we run the test as > ./khugepaged -s 2. Therefore, khugepaged starts collapsing the range to order-2 > folios, since PMD-collapse will fail due to the constraint. > When the scan reaches 254-257 PTE range, because at least one PTE in this range > is writable, with other 3 being read-only, khugepaged collapses this into an > order-2 mTHP, resulting in 3 extra PTEs getting unshared. After this, we encounter > a 4-sized chunk of read-only PTEs, and mTHP collapse stops according to the scaled > constraint, but the number of shared PTEs have now come under the constraint for > PMD-sized THPs. Therefore, the next scan of khugepaged will be able to collapse > this range into a PMD-mapped hugepage, leading to failure of this subtest. Fix > this by reducing the CoW range. Is this description essentially saying that it's now possible to creep towards collapsing to a full PMD-size block over successive scans due to rounding errors in the scaling? Or is this just trying an edge case and the problem doesn't generalize? > > Note: The only objective of this patch is to make the test work for the PMD-case; > no extension has been made for testing for mTHPs. > > Signed-off-by: Dev Jain <dev.jain@xxxxxxx> > --- > tools/testing/selftests/mm/khugepaged.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c > index 8a4d34cce36b..143c4ad9f6a1 100644 > --- a/tools/testing/selftests/mm/khugepaged.c > +++ b/tools/testing/selftests/mm/khugepaged.c > @@ -981,6 +981,7 @@ static void collapse_fork_compound(struct collapse_context *c, struct mem_ops *o > static void collapse_max_ptes_shared(struct collapse_context *c, struct mem_ops *ops) > { > int max_ptes_shared = thp_read_num("khugepaged/max_ptes_shared"); > + int fault_nr_pages = is_anon(ops) ? 1 << anon_order : 1; > int wstatus; > void *p; > > @@ -997,8 +998,8 @@ static void collapse_max_ptes_shared(struct collapse_context *c, struct mem_ops > fail("Fail"); > > printf("Trigger CoW on page %d of %d...", > - hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); > - ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); > + hpage_pmd_nr - max_ptes_shared - fault_nr_pages, hpage_pmd_nr); > + ops->fault(p, 0, (hpage_pmd_nr - max_ptes_shared - fault_nr_pages) * page_size); > if (ops->check_huge(p, 0)) > success("OK"); > else