Compiler optimizations can have serious implications on livepatching. Create a document that outlines common optimization patterns and safe ways to livepatch them. Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxx> --- .../livepatch/compiler-considerations.rst | 220 ++++++++++++++++++ Documentation/livepatch/index.rst | 1 + Documentation/livepatch/livepatch.rst | 7 + 3 files changed, 228 insertions(+) create mode 100644 Documentation/livepatch/compiler-considerations.rst diff --git a/Documentation/livepatch/compiler-considerations.rst b/Documentation/livepatch/compiler-considerations.rst new file mode 100644 index 000000000000..23b9cc01bb9c --- /dev/null +++ b/Documentation/livepatch/compiler-considerations.rst @@ -0,0 +1,220 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +======================= +Compiler considerations +======================= + +Creating livepatch modules may seem as straightforward as updating a +few functions in source code and registering them with the livepatch API. +This idealized method may produce functional livepatch modules in some +cases. + +.. warning:: + + A safe and accurate livepatch **must** take into account compiler + optimizations and their effect on the binary code that is executed and + ultimately livepatched. + +Examples +======== + +Interprocedural optimization (IPA) +---------------------------------- + +Function inlining is probably the most common compiler optimization that +affects livepatching. In a simple example, inlining transforms the original +code:: + + foo() { ... [ foo implementation ] ... } + + bar() { ... foo() ... } + +to:: + + bar() { ... [ foo implementation ] ... } + +Inlining is comparable to macro expansion, however the compiler may inline +cases which it determines worthwhile (while preserving original call/return +semantics in others) or even partially inline pieces of functions (see cold +functions in GCC function suffixes section below). + +To safely livepatch ``foo()`` from the previous example, all of its callers +need to be taken into consideration. For those callers that the compiler had +inlined ``foo()``, a livepatch should include a new version of the calling +function such that it: + + 1. Calls a new, patched version of the inlined function, or + 2. Provides an updated version of the caller that contains its own inlined + and updated version of the inlined function + +Other interesting IPA examples include: + +- *IPA-SRA*: removal of unused parameters, replace parameters passed by + referenced by parameters passed by value. This optimization basically + violates ABI. + + .. note:: + GCC changes the name of function. See GCC function suffixes + section below. + +- *IPA-CP*: find values passed to functions are constants and then optimizes + accordingly Several clones of a function are possible if a set is limited. + + .. note:: + GCC changes the name of function. See GCC function suffixes + section below. + +- *IPA-PURE-CONST*: discover which functions are pure or constant. GCC can + eliminate calls to such functions, memory accesses can be removed etc. + +- *IPA-ICF*: perform identical code folding for functions and read-only + variables. Replaces a function with an equivalent one. + +- *IPA-RA*: optimize saving and restoring registers if the compiler considers + it safe. + +- *Dead code elimination*: omit unused code paths from the resulting binary. + +GCC function suffixes +--------------------- + +GCC may rename original, copied, and cloned functions depending upon the +optimizations applied. Here is a partial list of name suffixes that the +compiler may apply to kernel functions: + +- *Cold subfunctions* : ``.code`` or ``.cold.<N>`` : parts of functions + (subfunctions) determined by attribute or optimization to be unlikely + executed. + + For example, the unlikely bits of ``irq_do_set_affinity()`` may be moved + out to subfunction ``irq_do_set_affinity.cold.49()``. Starting with GCC 9, + the numbered suffix has been removed. So in the previous example, the cold + subfunction is simply ``irq_do_set_affinity.cold()``. + +- *Partial inlining* : ``.part.<N>`` : parts of functions when split from + their original function body, improves overall inlining decisions. + + The ``cdev_put()`` function provides an interesting example of a partial + clone. GCC builds the source function:: + + void cdev_put(struct cdev *p) + { + if (p) { + struct module *owner = p->owner; + kobject_put(&p->kobj); + module_put(owner); + } + } + + into two functions, the conditional test in ``cdev_put()`` and the + ``kobject_put()`` and ``module_put()`` calls in ``cdev_put.part.0()``:: + + <cdev_put>: + e8 bb 60 73 00 callq ffffffff81a01a10 <__fentry__> + 48 85 ff test %rdi,%rdi + 74 05 je ffffffff812cb95f <cdev_put+0xf> + e9 a1 fc ff ff jmpq ffffffff812cb600 <cdev_put.part.0> + c3 retq + + <cdev_put.part.0>: + e8 0b 64 73 00 callq ffffffff81a01a10 <__fentry__> + 53 push %rbx + 48 8b 5f 60 mov 0x60(%rdi),%rbx + e8 a1 54 5a 00 callq ffffffff81870ab0 <kobject_put> + 48 89 df mov %rbx,%rdi + 5b pop %rbx + e9 b8 5c e8 ff jmpq ffffffff811512d0 <module_put> + 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) + 00 + + Some ``cdev_put()`` callers may take advantage of this function splitting + to inline one part or another. Others may also directly call the partial + clone. + +- *Constant propagation* : ``.constprop.<N>`` : function copies to enable + constant propagation when conflicting arguments exist. + + For example, consider ``cpumask_weight()`` and its copies for + ``cpumask_weight(cpu_possible_mask)`` and + ``cpumask_weight(__cpu_online_mask)``. Note how the ``.constprop`` copies + implicitly assign the function parameter:: + + <cpumask_weight>: + 8b 35 1e 7d 3e 01 mov 0x13e7d1e(%rip),%esi + e9 55 6e 3f 00 jmpq ffffffff8141d2b0 <__bitmap_weight> + + <cpumask_weight.constprop.28>: + 8b 35 79 cf 1c 01 mov 0x11ccf79(%rip),%esi + 48 c7 c7 80 db 40 82 mov $0xffffffff8240db80,%rdi + R_X86_64_32S __cpu_possible_mask + e9 a9 c0 1d 00 jmpq ffffffff8141d2b0 <__bitmap_weight> + + <cpumask_weight.constprop.108>: + 8b 35 de 69 32 01 mov 0x13269de(%rip),%esi + 48 c7 c7 80 d7 40 82 mov $0xffffffff8240d780,%rdi + R_X86_64_32S __cpu_online_mask + e9 0e 5b 33 00 jmpq ffffffff8141d2b0 <__bitmap_weight> + +- *IPA-SRA* : ``.isra.0`` : TODO + + +Coping with optimizations +========================= + +A livepatch author must take care to consider the consequences of +interprocedural optimizations that create function clones, ABI changes, +splitting, etc. A small change to one function may cascade through the +function call-chain, updating dozens more. A safe livepatch needs to be +fully compatible with all callers. + +kpatch-build +------------ + +Given an input .patch file, kpatch-build performs a binary comparison of +unpatched and patched kernel trees. This automates the detection of changes +in compiler-generated code, optimizations included. It is still important, +however, for a kpatch developer to learn about compiler transformations in +order to understand and control the set of modified functions. + +kgraft-analysis-tool +-------------------- + +With the -fdump-ipa-clones flag, GCC will dump IPA clones that were created +by all inter-procedural optimizations in ``<source>.000i.ipa-clones`` files. + +kgraft-analysis-tool pretty-prints those IPA cloning decisions. The full +list of affected functions provides additional updates that the source-based +livepatch author may need to consider. For example, for the function +``scatterwalk_unmap()``: + +:: + + $ ./kgraft-ipa-analysis.py --symbol=scatterwalk_unmap aesni-intel_glue.i.000i.ipa-clones + Function: scatterwalk_unmap/2930 (include/crypto/scatterwalk.h:81:60) + isra: scatterwalk_unmap.isra.2/3142 (include/crypto/scatterwalk.h:81:60) + inlining to: helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12) + inlining to: helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12) + inlining to: helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12) + + Affected functions: 3 + scatterwalk_unmap.isra.2/3142 (include/crypto/scatterwalk.h:81:60) + helper_rfc4106_decrypt/3007 (arch/x86/crypto/aesni-intel_glue.c:1016:12) + helper_rfc4106_encrypt/3006 (arch/x86/crypto/aesni-intel_glue.c:939:12) + +kgraft-ipa-analysis notes that it was inlined into function +``helper_rfc4106_decrypt()`` and was renamed with a ``.isra.<N>`` IPA +optimization suffix. A safe livepatch that updates ``scatterwalk_unmap()`` +will of course need to consider updating these functions as well. + +References +========== + +[1] GCC optimizations and their impact on livepatch + Miroslav Benes, 2016 Linux Plumbers Conferences + http://www.linuxplumbersconf.net/2016/ocw//system/presentations/3573/original/pres_gcc.pdf + +[2] kpatch-build + https://github.com/dynup/kpatch + +[3] kgraft-analysis-tool + https://github.com/marxin/kgraft-analysis-tool diff --git a/Documentation/livepatch/index.rst b/Documentation/livepatch/index.rst index 525944063be7..7fd8a94498a0 100644 --- a/Documentation/livepatch/index.rst +++ b/Documentation/livepatch/index.rst @@ -8,6 +8,7 @@ Kernel Livepatching :maxdepth: 1 livepatch + compiler-considerations callbacks cumulative-patches module-elf-format diff --git a/Documentation/livepatch/livepatch.rst b/Documentation/livepatch/livepatch.rst index c2c598c4ead8..b6d5beb16a00 100644 --- a/Documentation/livepatch/livepatch.rst +++ b/Documentation/livepatch/livepatch.rst @@ -432,6 +432,13 @@ The current Livepatch implementation has several limitations: by "notrace". + - Compiler optimizations can complicate livepatching. + + Optimizations may inline, clone and even change a function's calling + convention interface. Please consult the + Documentation/livepatching/compiler-considerations.rst file before + creating any livepatch modules. + - Livepatch works reliably only when the dynamic ftrace is located at the very beginning of the function. -- 2.21.3