Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Josh Poimboeuf <jpoimboe@xxxxxxxxxx> · Wed, 12 Feb 2025 17:36:03 -0800

On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote:
> CPU1				CPU1
> 
> 				klp_try_complete_transition()
> 
> 
> taskA:	
>  + fork()
>    + klp_copy_process()
>       child->patch_state = KLP_PATCH_UNPATCHED
> 
> 				  klp_try_switch_task(taskA)
> 				    // safe
> 
> 				child->patch_state = KLP_PATCH_PATCHED
> 
> 				all processes patched
> 
> 				klp_finish_transition()
> 
> 
> 	list_add_tail_rcu(&p->thread_node,
> 			  &p->signal->thread_head);
> 
> 
> BANG: The forked task has KLP_PATCH_UNPATCHED so that
>       klp_ftrace_handler() will redirect it to the old code.
> 
>       But CPU1 thinks that all tasks are migrated and is going
>       to finish the transition

Maybe klp_try_complete_transition() could iterate the tasks in two
passes?  The first pass would use rcu_read_lock().  Then if all tasks
appear to be patched, try again with tasklist_lock.

Or, we could do something completely different.  There's no need for
klp_copy_process() to copy the parent's state: a newly forked task can
be patched immediately because it has no stack.

So instead, just initialize it to KLP_TRANSITION_IDLE with
TIF_PATCH_PENDING cleared.  Then when klp_ftrace_handler() encounters a
KLP_TRANSITION_IDLE task, it considers its state to be 'klp_target_state'.

// called from copy_process()
void klp_init_task(struct task_struct *child)
{
	/* klp_ftrace_handler() will transition the task immediately */
	child->patch_state = KLP_TRANSITION_IDLE;
	clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
}

klp_ftrace_handler():

		patch_state = current->patch_state;

		if (patch_state == KLP_TRANSITION_IDLE)
			patch_state = klp_target_state;
		...

Hm?

> I would first like to understand how exactly the stall happens.
> It is possible that even rcu_read_lock() won't help here!
> 
> If the it takes too long time to check backtraces of all pending
> processes then even rcu_read_lock() might trigger the RCU stall
> warning as well.

Yeah, based on Yafang's reply it appears there are RCU stalls either
way.

-- 
Josh