On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote: > CPU1 CPU1 > > klp_try_complete_transition() > > > taskA: > + fork() > + klp_copy_process() > child->patch_state = KLP_PATCH_UNPATCHED > > klp_try_switch_task(taskA) > // safe > > child->patch_state = KLP_PATCH_PATCHED > > all processes patched > > klp_finish_transition() > > > list_add_tail_rcu(&p->thread_node, > &p->signal->thread_head); > > > BANG: The forked task has KLP_PATCH_UNPATCHED so that > klp_ftrace_handler() will redirect it to the old code. > > But CPU1 thinks that all tasks are migrated and is going > to finish the transition Maybe klp_try_complete_transition() could iterate the tasks in two passes? The first pass would use rcu_read_lock(). Then if all tasks appear to be patched, try again with tasklist_lock. Or, we could do something completely different. There's no need for klp_copy_process() to copy the parent's state: a newly forked task can be patched immediately because it has no stack. So instead, just initialize it to KLP_TRANSITION_IDLE with TIF_PATCH_PENDING cleared. Then when klp_ftrace_handler() encounters a KLP_TRANSITION_IDLE task, it considers its state to be 'klp_target_state'. // called from copy_process() void klp_init_task(struct task_struct *child) { /* klp_ftrace_handler() will transition the task immediately */ child->patch_state = KLP_TRANSITION_IDLE; clear_tsk_thread_flag(child, TIF_PATCH_PENDING); } klp_ftrace_handler(): patch_state = current->patch_state; if (patch_state == KLP_TRANSITION_IDLE) patch_state = klp_target_state; ... Hm? > I would first like to understand how exactly the stall happens. > It is possible that even rcu_read_lock() won't help here! > > If the it takes too long time to check backtraces of all pending > processes then even rcu_read_lock() might trigger the RCU stall > warning as well. Yeah, based on Yafang's reply it appears there are RCU stalls either way. -- Josh