> On Aug 11, 2024, at 12:17, Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > Batch the HVO work, including de-HVO of the source and HVO of the > destination hugeTLB folios, to speed up demotion. > > After commit bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with > speculative PFN walkers"), each request of HVO or de-HVO, batched or > not, invokes synchronize_rcu() once. For example, when not batched, > demoting one 1GB hugeTLB folio to 512 2MB hugeTLB folios invokes > synchronize_rcu() 513 times (1 de-HVO plus 512 HVO requests), whereas > when batched, only twice (1 de-HVO plus 1 HVO request). And > performance between the two cases are significantly different, e.g., > echo 2048kB >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote_size > time echo 100 >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote > > Before this patch: > real 8m58.158s > user 0m0.009s > sys 0m5.900s > > After this patch: > real 0m0.900s > user 0m0.000s > sys 0m0.851s > > Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> Reviewed-by: Muchun Song <muchun.song@xxxxxxxxx> Thanks.