On Wed 2025-02-12 19:54:21, Yafang Shao wrote: > On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > > > > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote: > > > > void klp_try_complete_transition(void) > > > > { > > > > + unsigned long timeout, proceed_pending_processes; > > > > unsigned int cpu; > > > > struct task_struct *g, *task; > > > > struct klp_patch *patch; > > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void) > > > > * unless the patch includes changes to a very common function. > > > > */ > > > > read_lock(&tasklist_lock); > > > > - for_each_process_thread(g, task) > > > > + timeout = jiffies + HZ; > > > > + proceed_pending_processes = 0; > > > > + for_each_process_thread(g, task) { > > > > + /* check if this task has already switched over */ > > > > + if (task->patch_state == klp_target_state) > > > > + continue; > > > > + > > > > + proceed_pending_processes++; > > > > + > > > > if (!klp_try_switch_task(task)) > > > > complete = false; > > > > + > > > > + /* > > > > + * Prevent hardlockup by not blocking tasklist_lock for too long. > > > > + * But guarantee the forward progress by making sure at least > > > > + * some pending processes were checked. > > > > + */ > > > > + if (rwlock_is_contended(&tasklist_lock) && > > > > + time_after(jiffies, timeout) && > > > > + proceed_pending_processes > 100) { > > > > + complete = false; > > > > + break; > > > > + } > > > > + } > > > > read_unlock(&tasklist_lock); > > > > > > Instead of all this can we not just use rcu_read_lock() instead of > > > tasklist_lock? > > > > I’m concerned that there’s a potential race condition in fork() if we > > use RCU, as illustrated below: > > > > CPU0 CPU1 > > > > write_lock_irq(&tasklist_lock); > > klp_copy_process(p); > > > > parent->patch_state=klp_target_state > > > > list_add_tail_rcu(&p->tasks, &init_task.tasks); > > write_unlock_irq(&tasklist_lock); > > > > In this scenario, after the parent executes klp_copy_process(p) to > > copy its patch_state to the child, but before adding the child to the > > task list, the parent’s patch_state might be updated by the KLP > > transition. This could result in the child being left with an outdated > > state. Similar race actually existed even before. > > We need to ensure that klp_copy_process() and list_add_tail_rcu() are > > treated as a single atomic operation. > > Before the newly forked task is added to the task list, it doesn’t > execute any code and can always be considered safe during the KLP > transition. Therefore, we could replace klp_copy_process() with > klp_init_process(), where we simply set patch_state to > KLP_TRANSITION_IDLE, as shown below: This might work when a new process is created, for example, using execve(). But it would fail when the process is just forked (man 2 fork) and both parent and child continue running the same code. > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process( > p->exit_signal = args->exit_signal; > } > > - klp_copy_process(p); > + // klp_init_process(p); > + clear_tsk_thread_flag(child, TIF_PATCH_PENDING); > + child->patch_state = KLP_TRANSITION_IDLE; This is exactly the loction where we depend on the sychronization with the tasklist_lock. It allows us to synchronize both TIF_PATCH_PENDING flag and p->patch_state between the parent and child, see the comment in klp_copy_process(): /* Called from copy_process() during fork */ void klp_copy_process(struct task_struct *child) { /* * The parent process may have gone through a KLP transition since * the thread flag was copied in setup_thread_stack earlier. Bring * the task flag up to date with the parent here. * * The operation is serialized against all klp_*_transition() * operations by the tasklist_lock. The only exceptions are * klp_update_patch_state(current) and __klp_sched_try_switch(), but we * cannot race with them because we are current. */ if (test_tsk_thread_flag(current, TIF_PATCH_PENDING)) set_tsk_thread_flag(child, TIF_PATCH_PENDING); else clear_tsk_thread_flag(child, TIF_PATCH_PENDING); child->patch_state = current->patch_state; } When using rcu_read_lock() in klp_try_complete_transition, child->patch_state might get an outdated information. The following race commes to my mind: CPU1 CPU1 klp_try_complete_transition() taskA: + fork() + klp_copy_process() child->patch_state = KLP_PATCH_UNPATCHED klp_try_switch_task(taskA) // safe child->patch_state = KLP_PATCH_PATCHED all processes patched klp_finish_transition() list_add_tail_rcu(&p->thread_node, &p->signal->thread_head); BANG: The forked task has KLP_PATCH_UNPATCHED so that klp_ftrace_handler() will redirect it to the old code. But CPU1 thinks that all tasks are migrated and is going to finish the transition My opinion: I am afraid that using rcu_read_lock() in klp_try_complete_transition() might cause more harm than good. The code already is complicated and this might make it even more tricky. I would first like to understand how exactly the stall happens. It is possible that even rcu_read_lock() won't help here! If the it takes too long time to check backtraces of all pending processes then even rcu_read_lock() might trigger the RCU stall warning as well. Best Regards, Petr