On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote: > vm_normal_page() is called so many times that its overhead is very high. > After changing this call site to an inline function, copy_page_range() > runs 3~5 times faster than before. So you're saying that your compiler is making bad decisions? What architecture, what compiler, what version? Do you have CONFIG_ARCH_HAS_PTE_SPECIAL set? Is there something about inlining it that makes the compiler able to optimise away code, or is it really the function call overhead? Can you share any perf results?