Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

Petr Mladek <pmladek@xxxxxxxx> · Mon, 27 Jan 2025 15:31:41 +0100

On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:

[...]

> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable," allowing targeted
>   replacements of only those patches.
> 
> This selective approach would reduce unnecessary transitions, lower the
> risk of temporary patch loss, and mitigate performance issues during
> livepatch replacement.
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
>  		klp_for_each_object(old_patch, old_obj) {
>  			int err;
>  
> +			if (!old_patch->replaceable)
> +				continue;

This is one example where things might get very complicated.

The same function might be livepatched by more livepatches, see
ops->func_stack. For example, let's have funcA and three livepatches:
a
  + lp1:
	.replace = false,
	.non-replace = true,
	.func =	{
			.old_name = "funcA",
			.new_func = lp1_funcA,
		}, { }

  + lp2:
	.replace = false,
	.non-replace = false,
	.func =	{
			.old_name = "funcA",
			.new_func = lp2_funcA,
		},{
			.old_name = "funcB",
			.new_func = lp2_funcB,
		}, { }

  + lp3:
	.replace = true,
	.non-replace = false,
	.func =	{
			.old_name = "funcB",
			.new_func = lp3_funcB,
		}, { }

Now, apply lp1:

      + funcA() gets redirected to lp1_funcA()

Then, apply lp2

      + funcA() gets redirected to lp2_funcA()

Finally, apply lp3:

      + The proposed code would add "nop()" for
	funcA() because	it exists in lp2 and does not exist in lp3.

      + funcA() will get redirected to the original code
	because of the nop() during transition

      + nop() will get removed in klp_complete_transition() and
	funcA() will get suddenly redirected to lp1_funcA()
	because it will still be on ops->func_stack even
	after the "nop" and lp2_funcA() gets removed.

	   => The entire system will start using another funcA
	      implementation at some random time

	   => this would violate the consistency model

The proper solution might be tricky:

1. We would need to detect this situation and do _not_ add
   the "nop" for lp3 and funcA().

2. We would need a more complicate code for handling the task states.

   klp_update_patch_state() sets task->patch_state using
   the global "klp_target_state". But in the above example,
   when enabling lp3:

    + funcA would need to get transitioned _backward_:
	 KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
      , so that it goes on ops->func_stack:
	 lp2_funcA -> lp1->funcA

   while:

    + funcA would need to get transitioned forward:
	 KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
      , so that it goes on ops->func_stack:
	 lp2_funcB -> lp3->funcB

=> the hybrid mode would complicate the life for both livepatch
   creators/maintainers and kernel code developers/maintainers.

   I am afraid that this complexity is not acceptable if there are
   better solutions for the original problem.

>  			err = klp_add_object_nops(patch, old_obj);
>  			if (err)
>  				return err;

I am sorry but I am quite strongly against this approach!

Best Regards,
Petr