On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote: > > The atomic replace livepatch mechanism was introduced to handle scenarios > > where we want to unload a specific livepatch without unloading others. > > However, its current implementation has significant shortcomings, making > > it less than ideal in practice. Below are the key downsides: > > > > - It is expensive > > > > During testing with frequent replacements of an old livepatch, random RCU > > warnings were observed: > > > > [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old. > > [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old. > > [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old. > > [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old. > > [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old. > > [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old. > > [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old. > > [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old. > > [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old. > > [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old. > > [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old. > > > > This indicates that atomic replacement can cause performance issues, > > particularly with RCU synchronization under frequent use. > > Why does this happen? It occurs during the KLP transition. It seems like the KLP transition is taking too long. [20329703.332453] livepatch: enabling patch 'livepatch_61_release6' [20329703.340417] livepatch: 'livepatch_61_release6': starting patching transition [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is 10166 jiffies old. [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is 10219 jiffies old. [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is 10199 jiffies old. [20329754.848036] livepatch: 'livepatch_61_release6': patching complete > > > - Potential Risks During Replacement > > > > One known issue involves replacing livepatched versions of critical > > functions such as do_exit(). During the replacement process, a panic > > might occur, as highlighted in [0]. Other potential risks may also arise > > due to inconsistencies or race conditions during transitions. > > That needs to be fixed. > > > - Temporary Loss of Patching > > > > During the replacement process, the old patch is set to a NOP (no-operation) > > before the new patch is fully applied. This creates a window where the > > function temporarily reverts to its original, unpatched state. If the old > > patch fixed a critical issue (e.g., one that prevented a system panic), the > > system could become vulnerable to that issue during the transition. > > Are you saying that atomic replace is not atomic? If so, this sounds > like another bug.