Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> 于2021年3月31日周三 下午1:44写道: > > On Mon, 29 Mar 2021 20:36:35 +0800 qianjun.kernel@xxxxxxxxx wrote: > > > From: jun qian <qianjun.kernel@xxxxxxxxx> > > > > In our project, Many business delays come from fork, so > > we started looking for the reason why fork is time-consuming. > > I used the ftrace with function_graph to trace the fork, found > > that the vm_normal_page will be called tens of thousands and > > the execution time of this vm_normal_page function is only a > > few nanoseconds. And the vm_normal_page is not a inline function. > > So I think if the function is inline style, it maybe reduce the > > call time overhead. > > > > I did the following experiment: > > > > use the bpftrace tool to trace the fork time : > > > > bpftrace -e 'kprobe:_do_fork/comm=="redis-server"/ {@st=nsecs;} \ > > kretprobe:_do_fork /comm=="redis-server"/{printf("the fork time \ > > is %d us\n", (nsecs-@st)/1000)}' > > > > no inline vm_normal_page: > > result: > > the fork time is 40743 us > > the fork time is 41746 us > > the fork time is 41336 us > > the fork time is 42417 us > > the fork time is 40612 us > > the fork time is 40930 us > > the fork time is 41910 us > > > > inline vm_normal_page: > > result: > > the fork time is 39276 us > > the fork time is 38974 us > > the fork time is 39436 us > > the fork time is 38815 us > > the fork time is 39878 us > > the fork time is 39176 us > > > > In the same test environment, we can get 3% to 4% of > > performance improvement. > > > > note:the test data is from the 4.18.0-193.6.3.el8_2.v1.1.x86_64, > > because my product use this version kernel to test the redis > > server, If you need to compare the latest version of the kernel > > test data, you can refer to the version 1 Patch. > > > > We need to compare the changes in the size of vmlinux: > > inline non-inline diff > > vmlinux size 9709248 bytes 9709824 bytes -576 bytes > > > > I get very different results with gcc-7.2.0: > > q:/usr/src/25> size mm/memory.o > text data bss dec hex filename > 74898 3375 64 78337 13201 mm/memory.o-before > 75119 3363 64 78546 132d2 mm/memory.o-after > > That's a somewhat significant increase in code size, and larger code > size has a worsened cache footprint. > > Not that this is necessarily a bad thing for a function which is > tightly called many times in succession as is vm__normal_page() > > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -592,7 +592,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, > > * PFNMAP mappings in order to support COWable mappings. > > * > > */ > > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > pte_t pte) > > { > > unsigned long pfn = pte_pfn(pte); > > I'm a bit surprised this made any difference - rumour has it that > modern gcc just ignores `inline' and makes up its own mind. Which is > why we added __always_inline. > the kernel code version: kernel-4.18.0-193.6.3.el8_2 gcc version 8.4.1 20200928 (Red Hat 8.4.1-1) (GCC) and I made it again, got the results, and later i will test in the latest version kernel with the new gcc. 757368576 vmlinux inline 757381440 vmlinux no inline