clear_contig_region() can be used to clear up to a huge-page (2MB/1GB) chunk Allow preemption in the irqentry_exit path to make sure we don't hold on to the CPU for an arbitrarily long period. Performance: vm-scalability/case-anon-w-seq-hugetlb mmaps an anonymous hugetlb-2mb region, and then writes sequentially to the region, demand faulting pages on the way. This test, with a CONFIG_VOLUNTARY config shows the effects of this change: stime drops (~18% on Icelakex, ~5% on Milan), while the utime goes up (~15% on Icelakex, ~13% on Milan.) *Icelakex* mm/clear_huge_page x86/clear_huge_page change (mem=4GB/task, tasks=128) stime 293.02 +- .49% 239.39 +- .83% -18.30% utime 440.11 +- .28% 508.74 +- .60% +15.59% wall-clock 5.96 +- .33% 6.27 +-2.23% + 5.20% *Milan* mm/clear_huge_page x86/clear_huge_page change (mem=1GB/task, tasks=512) stime 490.95 +- 3.55% 466.90 +- 4.79% - 4.89% utime 276.43 +- 2.85% 311.97 +- 5.15% +12.85% wall-clock 3.74 +- 6.41% 3.58 +- 7.82% - 4.27% The drop in stime is due to REP; STOS being more efficient for bigger extents. The increase in utime is due to cache effects of that change: mm/clear_huge_page() clears page-at-a-time, while narrowing towards the faulting page; while x86/clear_huge_page only optimizes for cache locality in the local neighbourhood of the faulting address. This effect on utime is visible via the increased L1-dcache-load-misses and LLC-load* and an increased backend boundedness for perf user-stat --all-user on Icelakex. The effect is slight but given the heavy cache pressure generated by the test, shows up in the drop in user IPC: - 9,455,243,414,829 instructions # 2.75 insn per cycle ( +- 14.14% ) (46.17%) - 2,367,920,864,112 L1-dcache-loads # 1.054 G/sec ( +- 14.14% ) (69.24%) - 42,075,182,813 L1-dcache-load-misses # 2.96% of all L1-dcache accesses ( +- 14.14% ) (69.24%) - 20,365,688 LLC-loads # 9.064 K/sec ( +- 13.98% ) (69.24%) - 890,382 LLC-load-misses # 7.18% of all LL-cache accesses ( +- 14.91% ) (69.24%) + 9,467,796,660,698 instructions # 2.37 insn per cycle ( +- 14.14% ) (46.16%) + 2,369,973,307,561 L1-dcache-loads # 1.027 G/sec ( +- 14.14% ) (69.24%) + 42,155,621,201 L1-dcache-load-misses # 2.96% of all L1-dcache accesses ( +- 14.14% ) (69.24%) + 22,116,300 LLC-loads # 9.588 K/sec ( +- 14.20% ) (69.24%) + 1,355,607 LLC-load-misses # 10.29% of all LL-cache accesses ( +- 15.49% ) (69.25%) Given the fact that the stime improves for all loads using this path, while the utime drop is load dependent add this change. Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> --- arch/x86/mm/hugetlbpage.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 4294b77c4f18..c8564b0552e5 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -158,7 +158,17 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, static void clear_contig_region(struct page *page, unsigned long vaddr, unsigned int npages) { + might_sleep(); + + /* + * We might be clearing a large region. + * Allow rescheduling. + */ + allow_resched(); clear_user_pages(page_address(page), vaddr, page, npages); + disallow_resched(); + + cond_resched(); } void clear_huge_page(struct page *page, -- 2.31.1