On Tue, 26 Apr 2022 19:18:21 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > For example, lru_gen_add_folio() is huge and has 4(?) call sites. This > > may well produce slower code due to the icache footprint. > > > > Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that > > file's .text from 80612 bytes to 78956. > > > > I tend to think that out-of-line regular old C functions should be the > > default and that the code should be inlined only when a clear benefit > > is demonstrable, or has at least been seriously thought about. > > I can move those functions to vmscan.c if you think it would improve > performance. I don't have a strong opinion here -- I was able to > measure the bloat but not the performance impact. This seems to be more an act of faith than anything else. Unlikely that any difference will be measurable. If there is a difference, the inlined version should win on microbenchmarks because all four copies of the function will be in cache. But a more realistic, broader test might suffer a slowdown due to having to move the larger text in more frequently. And inter-build alignment changes seem to make a larger difference than anything else, thus confounding measurement attempts.