Hi, let me summarize some results from performance comparisons of Linux kernels compiled with and without certain IPA optimizations. It's a slight abuse of this thread, but I think having the numbers might perhaps give some useful insights on the potential costs associated with the -flive-patching discussed here. All kudos go to Giovanni Gherdovich from the SUSE Performance Team who did all of the work presented below. For a TL;DR, see the conclusion at the end of this email. Martin Jambor <mjambor@xxxxxxx> writes: > (this message is a part of the thread originating with > https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html) > > We have just had a quick discussion with two upstream maintainers of > Linux kernel live-patching about this and the key points were: > > 1. SUSE live-patch creators (and I assume all that use the upstream > live-patching method) use Martin Liska's (somewhat under-documented) > -fdump-ipa-clones option and a utility he wrote > (https://github.com/marxin/kgraft-analysis-tool) to deal with all > kinds of inlining, IPA-CP and generally all IPA optimizations that > internally create a clone. The tool tells them what happened and > also lists all callers that need to be live-patched. > > 2. However, there is growing concern about other IPA analyses that do > not create a clone but still affect code generation in other > functions. Kernel developers have identified and disabled IPA-RA but > there is more of them such as IPA-modref analysis, stack alignment > propagation and possibly quite a few others which extract information > from one function and use it a caller or perhaps even some > almost-unrelated functions (such as detection of read-only and > write-only static global variables). > > The kernel live-patching community would welcome if GCC had an option > that could disable all such optimizations/analyses for which it > cannot provide a list of all affected functions (i.e. which ones need > to be live-patched if a particular function is). AFAIU, the currently known IPA optimizations of this category are (c.f. [1] and [2] from this thread): - -fipa-pure-const - -fipa-pta - -fipa-reference - -fipa-ra - -fipa-icf - -fipa-bit-cp - -fipa-vrp - and some others which might be problematic but currently can't get disabled on the cli: - stack alignment requirements - duplication of or skipping of alias analysis for functions/variables whose address is not taken (I don't know what that means, TBH). Some time ago, Giovanni compared the performance of a kernel compiled with -fno-ipa-pure-const -fno-ipa-pta -fno-ipa-reference -fno-ipa-ra -fno-ipa-icf -fno-ipa-bit-cp -fno-ipa-vrp plus (because I wasn't able to tell whether these are problematic in the context of live patching) -fno-ipa-cp -fno-ipa-cp-clone -fno-ipa-profile -fno-ipa-sra against a kernel compiled without any of these. The kernel was a 4.12.14 one with additional patches on top. The benchmarks had been performed on a smaller and on a bigger machine each. Specs: - single socket with a Xeon E3-1240 v5 (Skylake), 4 cores / 8 threads, 32G of memory (UMA) - 2 sockets with each one mounting a Xeon E5-2698 v4 (Broadwell) for a total of 40 cores / 80 threads and 528G of memory (NUMA) You can find the results here: https://beta.suse.com/private/nstange/ggherdovich-no-ipa-results/dashboard.html "laurel2" is the smaller machine, "hardy4" the bigger one. The numbers presented in the dashboard are a relative measure of how the no-ipa kernel was performing in comparison to the stock one. "1" means no change, and, roughly speaking, each deviation by 0.01 from that value corresponds to an overall performance change of 1%. Depending on the benchmark, higher means better (e.g. for throughput) or vice versa (e.g. for latencies). Some of the numbers are highlighted in green or red. Green means that the no-ipa kernel performs better, red the contrary. The sockperf-{tcp,udp}-under-load results are spoiled due to outliers, probably because of slow vs. fast paths. Please ignore. (If you're interested in the detailed results, you can click on any of those accumulated numbers in the dashboard. Scroll down and you'll find some nice plots.) For the overall outcome, let me quote Giovanni who summarized it nicely: What's left in red: * fsmark-threaded on laurel2 (skylake 8 cores), down 2%: if you look at the histograms of files created per seconds, there is never a clear winner between with and without IPA (except for the single-threaded case). Clean on hardy4. * sockperf-udp-throughput, hardy4: yep this one is statistically significant (in the plot you clearly see that the green dots are all below the yellow dots). 4% worst on average. Clean on the other machine. * tbench: this one is significant too (look at the histogram, no overlapping between the two distributions) but it's a curious one, because on the other machine is reversed (1% worse on the big hardy4, 4% better on the small laurel2). The other numbers don't change between the two kernels, or if they do the variance is large and you can't say much (large p-value). [/quote end] In conclusion, only a small subset of the tests peformed worse on the no-ipa kernel and for those that did, the differences haven't been really large in magnitude. Thanks, Nicolai [1] 20181003090457.GJ57692@xxxxxxxxxxxxxxx from Jan Hubicka <hubicka@xxxxxx> [2] ri65zyjti57.fsf@xxxxxxx from Martin Jambor <mjambor@xxxxxxx> > I assume this is orthogonal to the proposed -finline-only-static > option, but the above approach seems superior in all respects. > > 3. The community would also like to be involved in these discussions, > and therefore I am adding live-patching@xxxxxxxxxxxxxxx to CC. On a > related note, they will also have a live-patching mini-summit at the > Linux Plumbers conference in Vancouver in November where they plan to > discuss what they would like GCC to provide. > > Thanks, > > Martin -- SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)