Performance impact of disabling non-clone IPA optimizations for the Linux kernel (was: "GCC options for kernel live-patching")

Nicolai Stange <nstange@xxxxxxx> · Tue, 23 Oct 2018 13:35:48 +0200

Hi,

let me summarize some results from performance comparisons of Linux
kernels compiled with and without certain IPA optimizations.

It's a slight abuse of this thread, but I think having the numbers might
perhaps give some useful insights on the potential costs associated with
the -flive-patching discussed here.

All kudos go to Giovanni Gherdovich from the SUSE Performance Team who
did all of the work presented below.

For a TL;DR, see the conclusion at the end of this email.

Martin Jambor <mjambor@xxxxxxx> writes:

> (this message is a part of the thread originating with
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01018.html)
>
> We have just had a quick discussion with two upstream maintainers of
> Linux kernel live-patching about this and the key points were:
>
> 1. SUSE live-patch creators (and I assume all that use the upstream
>    live-patching method) use Martin Liska's (somewhat under-documented)
>    -fdump-ipa-clones option and a utility he wrote
>    (https://github.com/marxin/kgraft-analysis-tool) to deal with all
>    kinds of inlining, IPA-CP and generally all IPA optimizations that
>    internally create a clone.  The tool tells them what happened and
>    also lists all callers that need to be live-patched.
>
> 2. However, there is growing concern about other IPA analyses that do
>    not create a clone but still affect code generation in other
>    functions.  Kernel developers have identified and disabled IPA-RA but
>    there is more of them such as IPA-modref analysis, stack alignment
>    propagation and possibly quite a few others which extract information
>    from one function and use it a caller or perhaps even some
>    almost-unrelated functions (such as detection of read-only and
>    write-only static global variables).
>
>    The kernel live-patching community would welcome if GCC had an option
>    that could disable all such optimizations/analyses for which it
>    cannot provide a list of all affected functions (i.e. which ones need
>    to be live-patched if a particular function is).

AFAIU, the currently known IPA optimizations of this category are
(c.f. [1] and [2] from this thread):

 - -fipa-pure-const
 - -fipa-pta
 - -fipa-reference
 - -fipa-ra
 - -fipa-icf
 - -fipa-bit-cp
 - -fipa-vrp
 - and some others which might be problematic but currently can't get
   disabled on the cli:
   - stack alignment requirements
   - duplication of or skipping of alias analysis for
     functions/variables whose address is not taken (I don't know what
     that means, TBH).

Some time ago, Giovanni compared the performance of a kernel compiled with

 -fno-ipa-pure-const
 -fno-ipa-pta
 -fno-ipa-reference
 -fno-ipa-ra
 -fno-ipa-icf
 -fno-ipa-bit-cp
 -fno-ipa-vrp

plus (because I wasn't able to tell whether these are problematic in the
context of live patching)

 -fno-ipa-cp
 -fno-ipa-cp-clone
 -fno-ipa-profile
 -fno-ipa-sra

against a kernel compiled without any of these.

The kernel was a 4.12.14 one with additional patches on top.

The benchmarks had been performed on a smaller and on a bigger machine
each. Specs:
- single socket with a Xeon E3-1240 v5 (Skylake), 4 cores / 8 threads,
  32G of memory (UMA)
- 2 sockets with each one mounting a Xeon E5-2698 v4 (Broadwell) for a
  total of 40 cores / 80 threads and 528G of memory (NUMA)

You can find the results here:

  https://beta.suse.com/private/nstange/ggherdovich-no-ipa-results/dashboard.html

"laurel2" is the smaller machine, "hardy4" the bigger one.

The numbers presented in the dashboard are a relative measure of how the
no-ipa kernel was performing in comparison to the stock one. "1" means
no change, and, roughly speaking, each deviation by 0.01 from that value
corresponds to an overall performance change of 1%. Depending on the
benchmark, higher means better (e.g. for throughput) or vice versa
(e.g. for latencies). Some of the numbers are highlighted in green or
red. Green means that the no-ipa kernel performs better, red the
contrary.

The sockperf-{tcp,udp}-under-load results are spoiled due to outliers,
probably because of slow vs. fast paths. Please ignore.

(If you're interested in the detailed results, you can click on any of
 those accumulated numbers in the dashboard. Scroll down and you'll find
 some nice plots.)

For the overall outcome, let me quote Giovanni who summarized it nicely:

  What's left in red:

  * fsmark-threaded on laurel2 (skylake 8 cores), down 2%: if you look at the
    histograms of files created per seconds, there is never a clear winner
    between with and without IPA (except for the single-threaded case). Clean
    on hardy4.

  * sockperf-udp-throughput, hardy4: yep this one is statistically
    significant (in the plot you clearly see that the green dots are all
    below the yellow dots). 4% worst on average. Clean on the other machine.

  * tbench: this one is significant too (look at the histogram, no
    overlapping between the two distributions) but it's a curious one,
    because on the other machine is reversed (1% worse on the big hardy4, 4%
    better on the small laurel2).

  The other numbers don't change between the two kernels, or if they do the
  variance is large and you can't say much (large p-value).

[/quote end]

In conclusion, only a small subset of the tests peformed worse on the
no-ipa kernel and for those that did, the differences haven't been
really large in magnitude.

Thanks,

Nicolai

[1] 20181003090457.GJ57692@xxxxxxxxxxxxxxx  from Jan Hubicka <hubicka@xxxxxx>
[2] ri65zyjti57.fsf@xxxxxxx  from Martin Jambor <mjambor@xxxxxxx>

>    I assume this is orthogonal to the proposed -finline-only-static
>    option, but the above approach seems superior in all respects.
>
> 3. The community would also like to be involved in these discussions,
>    and therefore I am adding live-patching@xxxxxxxxxxxxxxx to CC.  On a
>    related note, they will also have a live-patching mini-summit at the
>    Linux Plumbers conference in Vancouver in November where they plan to
>    discuss what they would like GCC to provide.
>
> Thanks,
>
> Martin

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)