On Sun, Feb 05, 2023 at 09:53:53PM +0000, Matthew Wilcox wrote: > On Sun, Feb 05, 2023 at 03:06:02PM +0000, Hao Lee wrote: > > vm_normal_page() is called so many times that its overhead is very high. > > After changing this call site to an inline function, copy_page_range() > > runs 3~5 times faster than before. > > So you're saying that your compiler is making bad decisions? What > architecture, what compiler, what version? Do you have > CONFIG_ARCH_HAS_PTE_SPECIAL set? > > Is there something about inlining it that makes the compiler able to > optimise away code, or is it really the function call overhead? Can > you share any perf results? I am so embarrassed; I forgot to disable function_graph when timing the non-inlined function so my test was interfered. And the actual performance improvement is only ~3%. Please ignore this patch. Sorry... >