On Tue, Aug 13, 2024 at 3:00 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, 12 Aug 2024 16:48:23 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > Batch the HVO work, including de-HVO of the source and HVO of the > > destination hugeTLB folios, to speed up demotion. > > > > After commit bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with > > speculative PFN walkers"), each request of HVO or de-HVO, batched or > > not, invokes synchronize_rcu() once. For example, when not batched, > > demoting one 1GB hugeTLB folio to 512 2MB hugeTLB folios invokes > > synchronize_rcu() 513 times (1 de-HVO plus 512 HVO requests), whereas > > when batched, only twice (1 de-HVO plus 1 HVO request). And the > > performance difference between the two cases is significant, e.g., > > echo 2048kB >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote_size > > time echo 100 >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote > > > > Before this patch: > > real 8m58.158s > > user 0m0.009s > > sys 0m5.900s > > > > After this patch: > > real 0m0.900s > > user 0m0.000s > > sys 0m0.851s > > That's a large change. I assume the now-fixed regression was of > similar magnitude? Correct, and only the `real` time was regressed, due to synchronize_rcu(); the `sys` time is an improvement, since it's not affected by synchronize_rcu() (or not I could measure). > > Note that this patch changes the behavior of the `demote` interface > > when de-HVO fails. Before, the interface aborts immediately upon > > failure; now, it tries to finish an entire batch, meaning it can make > > extra progress if the rest of the batch contains folios that do not > > need to de-HVO. > > > > Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") > > Do we think we should add this to 6.10.x? I do. Agreed.