Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode

Yafang Shao <laoar.shao@xxxxxxxxx> · Mon, 27 Jan 2025 22:22:59 +0800

On Mon, Jan 27, 2025 at 9:46 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
> >
> > - It is expensive
> >
> >   During testing with frequent replacements of an old livepatch, random RCU
> >   warnings were observed:
> >
> >   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> >   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> >   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> >   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> >   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> >   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> >   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> >   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> >   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> >   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> >   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
> >
> >   This indicates that atomic replacement can cause performance issues,
> >   particularly with RCU synchronization under frequent use.
>
> Please, provide more details about the test:
>
>   + List of patched functions.

$ ls /sys/kernel/livepatch/livepatch_61_release12/vmlinux/
blk_throtl_cancel_bios,1  __d_move,1  do_iter_readv_writev,1  patched
           update_rq_clock,1
blk_mq_handle_expired,1  d_delete,1                do_exit,1
high_work_func,1        try_charge_memcg,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/xfs/
patched  xfs_extent_busy_flush,1  xfs_iget,1  xfs_inode_mark_reclaimable,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/fuse/
fuse_send_init,1  patched  process_init_reply,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/nf_tables/
nft_data_init,1  patched

>
>   + What exactly is meant by frequent replacements (busy loop?, once a minute?)

The script:

#!/bin/bash
while true; do
        yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
        ./apply_livepatch_61.sh # it will sleep 5s
        yum erase -y kernel-livepatch-6.1.12-0.x86_64
        yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
        ./apply_livepatch_61.sh  # it will sleep 5s
done

>
>   + Size of the systems (number of CPUs, number of running processes)

top - 22:08:17 up 227 days,  3:29,  1 user,  load average: 50.66, 32.92, 20.77
Tasks: 1349 total,   2 running, 543 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.4 us,  0.9 sy,  0.0 ni, 88.1 id,  2.9 wa,  0.3 hi,  0.4 si,  0.0 st
KiB Mem : 39406304+total,  8899864 free, 43732568 used, 34143062+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 32485065+avail Mem

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    64
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            25
Model:                 1
Model name:            AMD EPYC 7Q83 64-Core Processor
Stepping:              1
CPU MHz:               3234.573
CPU max MHz:           3854.4431
CPU min MHz:           1500.0000
BogoMIPS:              4890.66
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-127

>
>   + Were there any extra changes in the livepatch code code,
>     e.g. debugging messages?

Not at all.

>
>
> > - Potential Risks During Replacement
> >
> >   One known issue involves replacing livepatched versions of critical
> >   functions such as do_exit(). During the replacement process, a panic
> >   might occur, as highlighted in [0].
>
> I would rather prefer to make the atomic replace safe. I mean to
> block the transition until all exiting processes are not gone.
>
> Josh made a good point. We should look how this is handled by
> RCU, tracing, or another subsystems which might have similar
> problems.

I don't against this fix.

>
>
> > Other potential risks may also arise
> >   due to inconsistencies or race conditions during transitions.
>
> What inconsistencies and race conditions you have in mind, please?

I have explained it at
https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@xxxxxxxxxxxxxxx/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

 klp_ftrace_handler
      if (unlikely(func->transition)) {
          WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
  }

Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
that led to the decision to add this warning?

>
>
> > - Temporary Loss of Patching
> >
> >   During the replacement process, the old patch is set to a NOP (no-operation)
> >   before the new patch is fully applied. This creates a window where the
> >   function temporarily reverts to its original, unpatched state. If the old
> >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> >   system could become vulnerable to that issue during the transition.
>
> This is not true!
>
> Please, look where klp_patch_object() and klp_unpatch_objects() is
> called. Also look at how ops->func_stack is handled in
> klp_ftrace_handler().
>
> Also you might want to read Documentation/livepatch/livepatch.rst

Thanks for your guidance.

>
>
> > The current atomic replacement approach replaces all old livepatches,
> > even when such a sweeping change is unnecessary. This can be improved
> > by introducing a hybrid mode, which allows the coexistence of both
> > atomic replace and non atomic replace livepatches.
> >
> > In the hybrid mode:
> >
> > - Specific livepatches can be marked as "non-replaceable" to ensure they
> >   remain active and unaffected during replacements.
> >
> > - Other livepatches can be marked as "replaceable", allowing targeted
> >   replacements of only those patches.
>
> Honestly, I consider this as a workaround for a problem which should
> be fixed a proper way.

This is not a workaround. We should avoid replacing functions that do not
differ between the two versions. Since automatic handling is not
possible for now, we can provide a sysfs interface
to allow users to perform it manually.

>
> The main advantage of the atomic replace is simplify the maintenance
> and debugging.

Is it worth the high overhead on production servers?
Can you provide examples of companies that use atomic replacement at
scale in their production environments?

>  It reduces the amount of possible combinations. The
> hybrid mode brings back the jungle.

-- 
Regards
Yafang