Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Yafang Shao <laoar.shao@xxxxxxxxx> · Wed, 12 Feb 2025 19:54:21 +0800

On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > >  void klp_try_complete_transition(void)
> > >  {
> > > +     unsigned long timeout, proceed_pending_processes;
> > >       unsigned int cpu;
> > >       struct task_struct *g, *task;
> > >       struct klp_patch *patch;
> > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > >        * unless the patch includes changes to a very common function.
> > >        */
> > >       read_lock(&tasklist_lock);
> > > -     for_each_process_thread(g, task)
> > > +     timeout = jiffies + HZ;
> > > +     proceed_pending_processes = 0;
> > > +     for_each_process_thread(g, task) {
> > > +             /* check if this task has already switched over */
> > > +             if (task->patch_state == klp_target_state)
> > > +                     continue;
> > > +
> > > +             proceed_pending_processes++;
> > > +
> > >               if (!klp_try_switch_task(task))
> > >                       complete = false;
> > > +
> > > +             /*
> > > +              * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > +              * But guarantee the forward progress by making sure at least
> > > +              * some pending processes were checked.
> > > +              */
> > > +             if (rwlock_is_contended(&tasklist_lock) &&
> > > +                 time_after(jiffies, timeout) &&
> > > +                 proceed_pending_processes > 100) {
> > > +                     complete = false;
> > > +                     break;
> > > +             }
> > > +     }
> > >       read_unlock(&tasklist_lock);
> >
> > Instead of all this can we not just use rcu_read_lock() instead of
> > tasklist_lock?
>
> I’m concerned that there’s a potential race condition in fork() if we
> use RCU, as illustrated below:
>
>   CPU0                                                     CPU1
>
> write_lock_irq(&tasklist_lock);
> klp_copy_process(p);
>
>                                          parent->patch_state=klp_target_state
>
> list_add_tail_rcu(&p->tasks, &init_task.tasks);
> write_unlock_irq(&tasklist_lock);
>
> In this scenario, after the parent executes klp_copy_process(p) to
> copy its patch_state to the child, but before adding the child to the
> task list, the parent’s patch_state might be updated by the KLP
> transition. This could result in the child being left with an outdated
> state.
>
> We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> treated as a single atomic operation.

Before the newly forked task is added to the task list, it doesn’t
execute any code and can always be considered safe during the KLP
transition. Therefore, we could replace klp_copy_process() with
klp_init_process(), where we simply set patch_state to
KLP_TRANSITION_IDLE, as shown below:

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
                p->exit_signal = args->exit_signal;
        }

-       klp_copy_process(p);
+       // klp_init_process(p);
+       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
+       child->patch_state = KLP_TRANSITION_IDLE;

        sched_core_fork(p);

Some additional changes may be needed, such as removing
WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE) in
klp_ftrace_handler().

This would allow us to safely convert tasklist_lock to rcu_read_lock()
during the KLP transition.
Of course, we also need to address the race condition in do_exit().


--
Regards
Yafang